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ABSTRACT 



The invention is directed to isolated polypeptides bearing 
sequence homology to the Sp36 protein found in pneumo- 
coccal organisms, such as Streptococcus pneumoniae. Poly- 
nucleotides encoding such polypeptides are also disclosed. 
The invention also relates to antibodies specific for the 
disclosed polypeptides and to uses of such antibodies in the 
treatment of diseases caused by staphylococci as well as 
group A and B streptococci. In addition, the invention relates 
to the use of the disclosed polypeptides in compositions and 
as vaccines and for prophylactic uses such as in vaccination 
of animals, especially humans, against a wide variety of 
streptococcal, staphylococcal and other diseases. 

8 Claims, 9 Drawing Sheets 
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Val Glu His Pro Asp Glu Arg Pro His Ser Asn Asp Gly Trp Gly Asn 
690 695 700 

Ala Ser Glu His Val Leu Gly Lys Lye Asp His Ser Glu Asp Pro Asn 
705 710 715 720 

Lys Asn Phe Lys Ala Asp Glu Glu Pro Val Glu Glu Thr Pro Ala Glu 
725 730 735 

Pro Glu Val Pro Gin Val Glu Thr Glu Lyo Val Glu Ala Gin Leu Lyo 
740 745 750 

Glu Ala Glu Val Leu Leu Ala Lys Val Thr Asp Ser Ser Leu Lys Ala 
755 760 765 

Asn Ala Thr Glu Thr Leu Ala Gly Leu Arg Asn Asn Leu Thr Leu Gin 
770 775 7S0 

lie Met Asp Asn Asn Ser lie Met Ala Glu Ala Glu Lys Leu Leu Ala 
785 790 795 800 



Leu Leu Lys Gly Ser Asn Pro Ser Ser Val Ser Lys Glu Lys lie Asn 
805 810 815 



What is claimed is; 

1. An isolated polypeptide comprising an amino acid 
sequence with at least 95% sequence identity to the 
sequence of SEQ ID NO: 4 and wherein said polypeptide 
binds to an antibody that is specific for Sp36 (SEQ ID NO: 

2. An isolated polypeptide comprising an amino acid 
sequence with at least 95% sequence identity to a sequence 
selected from the group consisting of SEQ ID NO: 2 and 4 
wherein said polypeptide is identical to that found in an 
organism selected from the group consisting of Group A 
streptococci and Staphylococcus aureus and wherein said 
polypeptide binds to an antibody that is specific for Sp36 
(SEQ ID NO: 7). 

3. The isolated polypeptide of claim 2 wherein said Group 
A organism is Streptococcus pyogenes. 

4. The isolated polypeptide of claim 2 wherein said 
organism is Staphylococcus aureus. 



25 5. An isolated polypepbde comprising an amino acid 
sequence at least 95% identical to the sequence of SEQ ID 
NO: 4 and wherein said polypeptide has a sequence with at 
least 12.6% sequence identity to the amino acid sequence of 

3Q the Sp36 protein (SEQ ID NO: 7) of Streptococcus pneu- 
moniae and wherein said isolated polypeptide binds to an 
antibody that is specific for Sp36. 

6. An isolated polypeptide comprising the sequence of 
SEQ ID NO: 2 wherein said isolated polypeptide binds to an 

35 antibody that is specific for Sp36 (SEQ ID NO: 7) of 
Streptococcus pneumoniae. 

7. An isolated polypeptide comprising the amino acid 
sequence of SEQ ID NO: 2. 

8. An isolated polypeptide comprising the amino acid 
40 sequence of SEQ ID NO: 4. 
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EBI Dbfetch 



ID AF291695 standard; genomic DNA; PRO; 2541 BP. 
XX 

AC AF291695; 
XX 

SV AF2 91 695.1 
XX 

DT 19-MAR-2001 (Rel. 67, Created) 

DT 19-MAR-2001 (Rel. 67, Last updated, Version 1) 
XX 

DE Streptococcus pneumoniae pneumococcal histidine triad A protein (phtA) 

DE gene, complete cds. 

XX 

KW 

XX 

OS Streptococcus pneumoniae 

OC Bacteria; Firmicutes; Lactobacillales; Streptococcaceae; Streptococcus. 
XX 

RN [1] 

RP 1-2541 

RX DOI; 10. 1128/IAI. 69.3.1593-1598.2001 

RX PUBMED; 11179332 . 

RA Wizemann T.M., Heinrichs J.H., Adamou J.E., Erwin A.L., Kunsch C, 

RA Choi G.H., Barash S.C., Rosen C.A., Masure H.R., Tuomanen E . , Gayle A., 

RA Brewah Y.A. , Walsh W. ,. Barren P., Lathigra R. , Hanson M. , Langermann S., 

RA Johnson S., Koenig S.; 

RT "Use of a whole genome approach to identify vaccine molecules affording 

RT protection against Streptococcus pneumoniae infection."; 

RL Infect. Immun. 69 (3 ): 1593-1598 (2001) . 
XX 

RN [2] 

RP 1-2541 

RA Choi G.H. ; 

RT 

RL Submitted { 01-AUG-2000) to the EMBL/GenBank/DDBJ databases. 

RL Molecular Biology, Human Genome Sciences, Inc., 9410 Key West Ave., 

RL Rockville, MD 20850, USA 



XX 

FH Key Location/Qualifiers 
FH 

FT source 1. .2541 

FT /db xref=" taxon:1313 " 

FT /mo l_type=" genomic DNA" 

FT /organism^" Streptococcus pneumoniae" 

FT /strain="N4" 

FT CDS 91.. 2541 

FT /codon_start=l 

FT /db xref=" InterPro:IPR006270 " 

FT /db xre f = " Uni Pro t /TrEMBL : Q9AHT 9 " 

FT /note^"PhtA" 

FT /transl_table=ll 

FT /gene="phtA" 

FT /product="pneumococcal histidine triad A protein" 

FT /protein id=" AAK19155 . 1" 
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FT /translati on= "MKI NKKY LVGS AAAL I LS VCS YE LGL YQARTVKENNRVS Y I DGKQ 

FT ATQKTENLTPDEVSKREGINAEQIVIKITDQGYVTSHGDHYHYYNGKVPYDAIISEELL 

FT MKDPNYKLKDEDIWEVKGGWIKVDGKYYVYLKDAAHADNVRTKEEI 

FT EGGTPRNDGAVALARSQGRYTTDDGYIFNASDIIEDTGDAYIVPHGDHYHYIPKNELSA 

FT SELAAAEAFLSGRGNLSNSRTYRRQNSDNTSRTNWVPSVSNPGTTOTNTSNNSNTNSQA 

FT SQSNDIDSLLKQLYKLPLSQRHVESDGLVFDPAQITSRTARGVAVPHGDHYHFIPYSQM 

FT SELEERIARI I PLRYRSNHWVPDSRPEQPSPQPTPEPSPGPQPAPNLKI DSNSSLVSQL 

FT VRKVGEGWFEEKGISRWFAKDLPSETVKNLESKLSKQESVSHTLTAKKENVAPRDQE 

FT FYDKAYNLLTEAHKALFXNKGRNSDFQALDKLLER 

FT PERLGKPNSQIEYTEDEVRIAQLADKYTTSDGYIFDEHDIISDEGDAYVTPHMGHSHWI 

FT GKDSLSDKEKVAAQAYTKEKGILPPSPDADVK7^NPTGDSAAAIYNRVKGEKRIPLVRLP 

FT YMVEHTVEVKNGNLI IPHKDHYHNI KFAWFDDHTYKAPNGYTLEDLFATIKYYVEHPDE 

FT RPHSNDGWGNASEH^GKKDHSEDPNKNFKMEEPVEETPAEPEVPQVETEKVEAQLKE 

FT AEVXLAKVTDS SLKANATETLAGLRNNLTLQIMDNNS IMAEAEKLLALLKGSNPS SVSK 

FT EKIN" 
XX 

SQ Sequence 2541 BP; 888 A; 476 C; 516 G; 660 T; 1 other; 



taaactatta accagttaag taatagagag gagtttctgc aatttagaaa tgaattgcaa 60 

ctagaaatat caaatagaaa gagagtttcg atgaaaatta ataagaaata ccttgttggt 120 

tctgcggcag ctttgatttt aagtgtttgt tcttacgagt tgggactgta tcaagctaga 180 

acggttaagg aaaataatcg tgtttcctat atagatggaa aacaagcgac gcaaaaaacg 240 

gagaatttga ctcctgatga ggttagcaag cgtgaaggaa tcaatgctga gcaaatcgtc 300 

atcaagataa cagaccaagg ctatgtcact tcacatggcg accactatca ttattacaat 360 

ggtaaggttc cttatgacgc tatcatcagt gaagaattac tcatgaaaga tccaaactat 420 

aagctaaaag atgaggatat tgttaatgag gtcaagggtg gatatgttat caaggtagat 480 

ggaaaatact atgtttacct taaggatgct gcccacgcgg ataacgtccg tacaaaagag 540 

gaaatcaatc gacaaaaaca agagcatagt caacatcgtg aaggtggaac tccaagaaac 600 

gatggtgctg ttgccttggc acgttcgcaa ggacgctata ctacagatga tggttatatc 660 

tttaatgctt ctgatatcat agaggatact ggtgatgctt atatcgttcc tcatggagat 720 

cattaccatt acattcctaa gaatgagtta tcagctagcg agttggctgc tgcagaagcc 780 

ttcctatctg gtcgaggaaa tctgtcaaat tcaagaacct atcgccgaca aaatagcgat 840 

aacacttcaa gaacaaactg ggtaccttct gtaagcaatc caggaactac aaatactaac 900 

acaagcaaca acagcaacac taacagtcaa gcaagtcaaa gtaatgacat tgatagtctc 960 

ttgaaacagc tctacaaact gcctttgagt caacgacatg tagaatctga tggccttgtc 1020 

tttgatccag cacaaatcac aagtcgaaca gctagaggtg ttgcagtgcc acacggagat 1080 

cattaccact tcatccctta ctctcaaatg tctgaattgg aagaacgaat cgctcgtatt 1140 

attccccttc gttatcgttc aaaccattgg gtaccagatt caaggccaga acaaccaagt 1200 

ccacaaccga ctccggaacc tagtccaggc ccgcaacctg caccaaatct taaaatagac 1260 

tcaaattctt ctttggttag tcagctggta cgaaaagttg gggaaggata tgtattcgaa 1320 

gaaaagggca tctctcgtta tgtctttgcg aaagatttac catctgaaac tgttaaaaat 1380 

cttgaaagca agttatcaaa acaagagagt gtttcacaca ctttaactgc taaaaaagaa 1440 

aatgttgctc ctcgtgacca agaattttat gataaagcat ataatctgtt aactgaggct 1500 

cataaagcct tgtttgnaaa taagggtcgt aattctgatt tccaagcctt agacaaatta 1560 

ttagaacgct tgaatgatga atcgactaat aaagaaaaat tggtagatga tttattggca 1620 

ttcctagcac caattaccca tccagagcga cttggcaaac caaattctca aattgagtat 1680 

actgaagacg aagttcgtat tgctcaatta gctgataagt atacaacgtc agatggttac 1740 

atttttgatg aacatgatat aatcagtgat gaaggagatg catatgtaac gcctcatatg 1800 

ggccatagtc actggattgg aaaagatagc ctttctgata aggaaaaagt tgcagctcaa 18 60 

gcctatacta aagaaaaagg tatcctacct ccatctccag acgcagatgt taaagcaaat 1920 

ccaactggag atagtgcagc agctatttac aatcgtgtga aaggggaaaa acgaattcca 1980 

ctcgttcgac ttccatatat ggttgagcat acagttgagg ttaaaaacgg taatttgatt 2040 

attcctcata aggatcatta ccataatatt aaatttgctt ggtttgatga tcacacatac 2100 

aaagctccaa atggctatac cttggaagat ttgtttgcga cgattaagta ctacgtagaa 2160 

caccctgacg aacgtccaca ttctaatgat ggatggggca atgccagtga gcatgtgtta 2220 

ggcaagaaag accacagtga agatccaaat aagaacttca aagcggatga agagccagta 2280 

gaggaaacac ctgctgagcc agaagtccct caagtagaga ctgaaaaagt agaagcccaa 2340 

ctcaaagaag cagaagtttt gcttgcgaaa gtaacggatt ctagtctgaa agccaatgca 2400 

acagaaactc tagctggttt acgaaataat ttgactcttc aaattatgga taacaatagt 2460 

atcatggcag aagcagaaaa attacttgcg ttgttaaaag gaagtaatcc ttcatctgta 2520 

agtaaggaaa aaataaacta a 2541 

// 
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[0077] The identification of multiple coil structures of alpha helical amino acid segments in the S. 
pneumoniae polypeptides according to the invention may be determined by the location of proline rich 
areas with respect to one another. Further the "X" area optionally located between two or more alpha- 
helical structures can play a part in the formation of a coil within a coil structure. Standard three- 
dimensional protein modeling may be utilized for determining the relative shape of such structures. An 
example of a computer program, the Paircoil Scoring Form Program ("PairCoil program"), useful for 
such three-dimensional protein modelling is described by Berger et al. in the Proc. Natl. Acad, of Sci. 
(USA), 92:8259-8263 (August 1995). The PairCoil program is described as a computer program that 
utilizes a mathematical algorithm to predict locations of coiled -coil regions in amino acid sequences. A 
further example of such a computer program is described by Wolf et al., Protein Science 6: 1 179-1 189 
(June 1997) which is called the Multicoil Scoring Form Program ("Multicoil program"). The MultiCoil 
program is based on the PairCoil algorithm and is useful for locating dimeric and trimeric coiled coils. 
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tr Q97QM9 Conserved domain protein [SP1174] [Streptococcus pneumoniae] 819 AA 

al ign 

Score = 423 bits (1087), Expect = e-116 

Identities * 211/357 (59%), Positives = 271/357 (75%), Gaps = 11/357 (3%) 

Query: 1 MKFSKKYIAAGSAVIVSLSLCAYALNQHRS-QENKDNNRVSYVDGSQSSQKSENLTPDQV 59 

MK +KKY+A GS +++LS+C+Y L ++++ Q+ K++NRV+Y+DG Q+ QK+ENLTPD+V 
Sbjct: 1 MKINKKYLA-GSVAVLALSVCSYELGRYQAGQDKKESNRVAYIDGDQAGQKAENLTPDEV 59 

Query: 60 SQKEGIQAEQIVIKITDQGYVTSHGDHYHYYNGKVPYDALFSEELLMKDPNYQLKDADIV 119 

S++EGI AEQIVIKITDQGYVTSHGDHYHYYNGKVPYDA+ S EELLMKD PNYQLKD +DI V 
Sbjct: 60 SKREGINAEQI VI KITDQGYVTSHGDHYHYYNGKVPYDAI IS EELLMKD PNYQLKDSDIV 119 

Query: 120 NEVKGGYI IKVDGKYYWLKDAAHADNWTKDEINRQK^ 178 

NE+KGGY+IKV+GKYYVYLKDAAHADN+RTK+EI RQKQE + N + H VA AR+QG 
Sbjct: 120 NE I KGG YVI KVNGKYYVYLKDAAHADNIRTKEE I KRQKQERS HNHNSRADNAVAAARAQG 179 

Query: 179 RYTTNDGYVFNPADIIEDTGNAYIVPHGGHYHYIPXXXXXXXXXXXXXXXXXXXNMQPSQ 238 

RYTT+DGY+FN +D 1 1 EDTG -h AY I VPHG HYHYIP Q S + 

Sbjct: 180 RYTTDDGYI FNASDI IEDTGDAYIVPHGDHYHYI P - - KNELSASELAAAEAYWNGKQGSR 23 7 

Query: 23 9 LSYSSTASDNNTQ SVAKGSTSKPA NKSENLQSLLKELYDSPSAQRYSESDGLVF 292 

S SS+ + N Q S TP N+ EN+ SLL+ELY P + +R+ ESDGL+F 

Sbjct: 238 PSSSSSYNANPAQPRLSENHNLTVTPTYHQNQGENISSLLRELYAKPLSERHVESDGLIF 297 

Query: 293 DPAKIISRTPNGVAIPHGDHYHFIPYSKLSALEEKIARRVPISGTGSTVSTNAKPNE 349 

DPA+I SRT GVA+PHG+HYHFIPY ++S LE++IAR +P+ + +++P E 
Sbjct; 298 DPAQ I TSRTARGVAVPHGNHYHF I PYEQMS ELEKRI ARI I PLRYRSNHWVPDSRPEE 354 
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• i i i i i i i i i i _ 

1 500 1000 1 

Q8DQ87 ^M^H— ■^MMMMNMMMMM— M—MNMMNM » 

Q9ANY1 MMMMMMMMMMMMMMMMHMMMMMMWMMNMMWMNMMi M 

Q6HNQ5 HMMMMM^^^^MMMnMMH^^MMMM^MMi^HMMMHMHI « 

Q8CMR4 . „ - i- r..T^.i -':;:::z.riHHMi r : 

Q8DPQ2 • , mmhmbmmhn 

Q9AG74 ^zmmmmammmmmmm 

Q9AHT9 ::;:Oc:::l:::;::z::hhhhmi 

Q8DQ88 ;r;:BHHHMHi i;- 

Q6T8D7 ;.■-.= - ur:^ r— t^wbmhmhhmm rr 

Q97QH8 „-^r-™.:_ . ^ x^.m i^mmmm r " 

Q9RNY2 ^r.r:;;" r^a»»yBaBEfi!Hi 

Q97QH9 :^:mhmmmh 

Q9RNY3 "J" l .lL"- ■ . im ■■■ ■■ii mm — 

Q6HNQ3 rT^i y ' SaMMHHH | 
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AGAAGCCrATTGGAATGGGAAGCAGGGATCTCGTCCTTCTTCAA 
TCAACCAAGATTGTCAGAGAACCACAATCTGACTGTCACTCCAACTTATC 
CATTTCAAGCCTTTTACGTGAATTGTATGCTAAACCCTTATCAGAACGCCATGTGGAATCTC 
TATTTTCGACCCAGCGCAAATCACAAGTCGAACCGCCAGAGGTGTAGCTGTCCCTCATGGTAACCATTA 
CCACTTTATCCCTTATGAACAAATGTCTGAATTGGAAAAACGMTTGCTCGTATTATTCCCC 
TCGTTCAAACCATTGGGTACCAGATTCAAGACCAGAACAACCAAGTCCACAATCGACTCCGGAACCTAG 
TCCAAGTCCGCAACCTGCACCAAATCCTCAACCAGCTCCAAGCAATCCAATTGATGAGAAATTGGTCAA 
AGAAGCTGTTCGAAAAGTAGGCGATGGTTATGTCTTTGAGGAGAATGGAGTTTCTCGTTATATCCCAGC 
CAAGGATCTTTCAGCAGAAACAGCAGCAGGCATTCATAGCAAACTGGCCAAGCAGGAAAGTTTATCTCA 
TAAGCTAGGAGCTAAGAAAACTGACCTCCCATCTAGTGATCGAGAATTTTACAATAAGGCTTATGACTT 
ACTAGCAAGAATTCACCAAGATTTACTTGATAATAAAGGTCGACAAGTTGATTTTGAGGCTTTGGATAA 
CCTGTTGGAACGACTCAAGGATGTCNCAAGTGATAAAGTCAAGTTAGTGGANGATATTCTTGCCTTCTT 
AGCTCCGATTCGTCATCCAGAACGTTTAGGAAAACCAAATGCGCAAATTACCTACACTGATGATGAGAT 
TCAAGTAGCCAAGTTGGCAGGCAAGTACACAACAGAAGACGGTTATATCTTTGATCCTCGTGATATAAC 
C AGTG ATGAGGGGG ATGCCT ATGTAACTCC AC ATATGAC C C AT AGCC ACTGG ATT AAAAAAG ATAGTTT 
GTCTGAAGCTGAGAGAGCGGCAGCCCAGGCTTATGCTAAAGAGAAAGGTTTGACCCCTCCTTCGACAGA 
CCATCAGGATTCAGGAAATACTGAGGCAAAAGGAGCAGAAGCTATCTACAACCGCGTGAAAGCAGCTAA 
GAAGGTGCCACTTGATCGTATGCCTTACAATCTTCAATATACTGTAGAAGTCAAAAACGGTAGTTTAAT 
CATACCTCATTATGACCATTACCATAACATCAAATTTGAGTGGTTTGACGAAGGCCTTTATGAGGCACC 
TAAGGGGTATACTCTTGAGGATCTTTTGGCGACTGTCAAGTACTATGTCGAACATCCAAACGAACGTCC 
GCATTCAGATAATGGTTTTGGTAACGCTAGCGACCATGTTCAAAGAAACAAAAATGGTCAAGCTGATAC 
CAATCAAACGGAAAAACCAAGCGAGGAGAAACCTCAGACAGAAAAACCTGAGGAAGAAACCCCTCGAGA 
AGAGAAACCGCAAAGCGAGAAACCAGAGTCTCCAAAACCAACAGAGGAACCAGAAGAATCACCAGAGGA 
ATCAGAACAACCTCAGGTCGAGACTGAAAAGGTTGAAGAAAAACTGAGAGAGGCTGAAGATTTACTTGG 
TCCAGGAT , 

SFCF42 amino acid (SEQ ID NO: €6) 
CSYELGRHQAGOVKKESNRVSYIDGDQAGQKAENLTPDEVSKREGINAEQXVIKITDQGYVTSHGDHyH 
YYNGKVPYDAIISEELLMKDPNYQLKDSDIVNEIKGGWIKVNGKYYV^ 

QERSHNHNSRADNAVAAARAQGRYTTDDGYIFNASDIIEDTGDAYIVPHGDHYHyiPKNELSASELAAA 

EAYWGKQGSRPSSSSSYNANPAQPRLSENHNLTVTPTYHQNQGENISSLLRELYAKPLSERHVESDGL 

IcDPAQITSRTARGVAVPHGNHYHFIPYEQMSELEKRIARIIPLRYRSNHWVPDSRPEQPSPQSTPEPS 

PSPQPAPNPQPAPSNPIDEKLVKEAVRKVGDGYVFEENGVSRYIPAKDLSAETAAGIDSKLAKQESLSH 

KLGAKKTDLPSSDREFYNKAYDLLARIHQDLLDNKGRQVDFEALDNLLERLKDVXSDKVKLVXDILAFL 

APIRHPERLGKPNAQITYTDDEIQVAKtAGKYTTEBGYIFDPRDITSDEGDAYVTPHMTHSHWIKKDSL 

SEAERAAAQAYAKEKGLTPPSTDHQDSGhTTEAKGAEAIYNRVKAAKKVPLI>RMPTO 

IPHYDHYHNIKFEWDEGLYEAPKGYTLEDLLATVKYYVEHPNERPHSDNGFGNASDHVQRNKNGQAOT 

NQTEKPSEEKPQTEKPEEETPREEKPQSEXPESPKPTEEPEESPEESEEPQVETEKVEEKLREAEDLLG 

KIQD 

3F043 nucleotide (SEQ ID NO: 67) 

TTATAAGGGTGAATTAGAAAAAGGATACCAATTTGATGGTTGGGAAATTTCTGGTTTCGAAGGTAAAAA 
AGACGCTGGCTATGTTATTAATCTATCAAAAGATACCTTTATAAAACCTGTATTCAAGAAAATAGAGGA 
GAAAAAGGAGGAAGAAAATAAACCTACTTTTGATGTATCGAAAAAGAAAGATAACCCACAAGTAAACCA 
TAGTCAATTAAATGAAAGTCACAGAAAAGAGGATTTACAAAGAGAAGAGCATTCACAAAAATCTGATTC 
AACTAAGGATGTTACAGCTACAGTTCTTGATAAAAACAATATCAGTAGTAAATCAACTACTAACAATCC 
TAATAAG 

SP043 amino acid (SEQ XD NO: €8) 

YKGELEKGYQFDGWEISGFEGKKDAGYVINLSKDTFIKPVFKKIEEKKEEENKPTFDVSKKKDNPQVNH 
SQLNESHRKEDLQREEHSQKSDSTKDVTATVLDKNNISSKSTTNNPNK 

SPG44 nucleotide (SEQ XD NOi69) 

GAATCTTCAGGCTCAAGAAAGTTCAGGAAATAAAATCCACTTTATCAAT^JTTCAAGAAGGTGGCAGTGA 
TGCGATTATTCTTGAAAGCAATGGACATTTTGCCATGGTGGATACAGGAGAAGATTATGATTTCCCAGA 
TGGAAGTGATTCTCGCTATCCATGGAGAGAAGGAATTGAAACGTCTTATAAGCATGTTCTAACAGACCG 
TGTCTTTCGTCGTTTGAAGGAATTGGGTGTCCAAAAACTTGAT1TTATTTTGGTGACCCATACCCACAG 
TGATCATATTGGAAATCTTGATGAATTACTGTCTACCTATCCAGTTGACCGAGTCTATCTTAAGAAATA 
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TGTCAGAATT AACATCTCCA AACGCTGTTC TTGAATCGCT CATTCTGATA CCATTTTCTG 10200 

CACAATAAAC CAATACACGA TTATAGGCTT CTGTACATTT AACCACTATA TACAATTCAA 10260 

TCATTTTAGA ACGATTTTGC AGATATTTTT TTAGTGGTTG GAACATGGAT ATCACACCCC 10320 

AAACAGAAAT GGCTACTAAA AGAGCTCCCT CATAAGG 10357 



(2) INFORMATION FOR SEQ ID NO: 192: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 6867 base pairs 

(B) TYPB; nucleic acid 

(C) STRANDEDNESS : double 
(O) «?OPOU)GY: linear 



(xi) SEQUENCE DESCRIPTION; SEQ ID NO; 192; 

CCGGACATTC TCAATCTTCT CTCTTTTGTT TTTCTCTTCT TTCTATGATA CAATGGAAAA 60 

AATAAATTCA AAAGGAGTTT TTTTATGACT TATCCAAATC TCTTCGACCG CTTCTTAACC 120 

TATGTTAAGG TCAACACGCG CTCTGATGAA CACTCTACTA CTACTCCAAG TACACAGAGT 180 

CAGGTTGACT TCGCAACAAA TGTCCTAATT CCTCAAATGA AACGTGTTGG ACTGCAAAAT 240 

GTTTACTATC TACCGAATGG TTTTGCTATT GGAACCTTGC CAGCCAACGA TCCGTCTTTA 300 

ACACGTAAGA TTGGTTTTAT ATCGCACATG GATACTGCTG ATTTTAATGC TGAAGGAGTC 360 

AATCCACAGG TAATTGAAAA CTACGATGGT GGTGTGATTG AACTAGGGAA TTCTGGTTTC 420 

AAACTCGATC CAGCTGACTT CAAGAGTCTT GAAAAATATC CAGGACAAAC GCTCATCACA 480 

ACAGATGGAA CAACCTTGCT AGGTGCTGAT GACAAGTCAG GAATTGCTGA AATTATGACA 540 

GCCATTGAAT ATCTAACTGC TCATCCTGAA ATTAAGCACT GTGAGATTCG TGTTGGTTTT 600 

GGTCCAGATG AAGAAATCGG TGTTGGTGCC AATAAATTTC ATGCAGAAGA TTTTGATGTG 660 

GATTTTGCCT ACACTGTTGA TGGTGGTCCA CTAGGTGAAC TTCAGTACGA GACTTTCTCA 720 

GCCGCTGGTG CTGAATTGCA TTTCCAAGGT CGTAATGTCC ACCCTGGTAC TGCCAAAGGG 780 

CAGATGGTCA ATGCCCTTCA GCTAGCAATT GATTTTCATA ATCAACTTCC AGAAAATGAC 840 

CGACCTGAGT TAACTGAAGG TTACCAAGGT TTTTACCATC TAATGGATGT GACAGGTAGT 900 

GTTGAGGAGG CGCGTGCAAG CTACATCATT CGTGATTTTG AAAAAGATGC CTTTGAAGCG 960 

CGTAAAGCAT CCATGCAATC TATCGCTGAT AAGATGAATG AAGAACTTGG GAGCGACCGT 1020 

GTCACTCTCA ACTTGACAGA CCAGTACTAC AATATGAAAG AAGTCATTGA AAAAGATATG 1080 

ACTCCAATTA CCATTGCTAA AGCCGTTATG GAAGATCTAG GTATCACGCC TATTATCGAA 1140 
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CCAATCCCGG 


CTGGAACAGA 


CGGCTCTAAG ATTTCCTTTA 


TGGGAATCCC AACTCCGAAT 


1200 


ATCTTTGCAG 


CTGGCGAAAA 


TATGCACGGA CGTTTTGAAT 


ACGTTAGCCT 


TCAGACTATG 


1260 


GAACGPGCAG 


TTGATACCAT 


CATTGGCATC GTAGCTTATA 


AAGGCTAAAA 


AGACGAGGTA 


1320 


CCTCAGCTAC 


TTCGCCTTTC 


TTTTTATTCT ACIWJTriT 


CTTGATTTCC 


AGTAGTTGTA 


1360 


GAAGATTCTG 


TTGTTTCATT 


TTCTGAAGTT GATTCAGCAG 


GTTTAGAATC 


TCTTGTATTG 


1440 


CTTGGTTTGT 


TTTCGTCGCT 


AGCAGTTTCA ATGTTAGATT 


CTGCAGTTGC 


GTTTGCTTGG 


1500 


TTCTCAGCAC 


TGGTGTTATC 


ACCATTTCCT TCAGCATTTC 


TTGCTGGACT 


TGTTTCTTCA 


1560 


CTTGCGCTAG 


CTTTTCACTG 


GATTTGATGA TTCAAAACTA 


GAATAGCTTT 


TGTCGATTCA 


1620 


AGTAAAGCTG 


TTTTGTCTTT 


ACTCTTAGCA GAAAGTTGAT 


CTAATAATGC 


ATCCACCTTA 


1680 


TCAAACTCCG 


CATCAGATCC 


ATTATTACTT TCTAAATAAG 


AGTGAAGCGA 


CATGAGAATA 


1740 


TCGTAGAGTT 


TTTGATAGAG 


TACAAGTGTC TGAGGATCTT 


GCTCAGCATT 


TTCCTTTTCT 


1800 


TGTTGAAGGG 


CGCTACCGAT 


ACGAGTCAAG ACATCTTTTA 


CCTCACTGTT 


TACTTCATCC 


1860 


AAGTCTGCAT 


CACCCTTGIT 


TGTCCCACCT TTTAGATTTT 


CTACTTCTTC 


TGCCAAGGAT 


1920 


TGTCTGATTC 


CTTCTTCATG 


GATTTCTTCC AACAGTTGAT 


TTGCCTTGCT 


CAAAACACTT 


1980 


TCTACTTCTT 


CCTTGCTATC 


TGTCGCAGAT TATTGGTTGC 


TATCTACCAT 


GTACTCCTAA 


2040 


AACAGGAGAG 


TTATAATCCA 


AGATTACAAG GCCTTACAGA 


AATAAGAAAT 


CCAGATAACA 


2100 


CAATGTTCCT 


CCAAGACGCT 


ATTCGCTTCG CACAGCAGCA 


CGGATTCAAT 


ATGCTTTAAT 


2160 


TTTAAAGTTT 


AGGTGTCAAG 


ACCTCTTTTT AGTGTGCCCA 


AAATTTAGAG 


AAGTAATCAA 


2220 


TCAACTAACT 


TTTATTTTTT 


TCAAACTTTC AGTAAACTGA 


CCTAAAGCTA 


ACTCAATCTC 


2280 


TCTTTGTAGA 


TGCTTCTGCT 


ATCAGCTAGA AGTTGATCTA 


CTTTTGCCAA 


GACTGCCTTC 


2340 


TCATCAAAAG 


TTCCACGTTG 


ATAGTTGGAT 1GCAGGGATG 


GAATCTTGTT 


TTTCAAAGCC 


2400 


GCTTCATATC 


CCTTAGTTTG 


AACCTTGATG TACTGATTGT 


GGTCGCCATG 


AGGAATCACA 


2460 


AAACCTTCTG 


AATCTTCACT 


TATAATTCGA TTGGCATCAA 


AACCATGACC 


ATCTTCTTCC 


2520 


TCATGATGGA 


CATGTAGTGA 


CGGATTACTT AATACAGAAC 


TAGAAGAACT 


TCCTACCTCT 


2580 


TCCGTGTTAG 


AGTGTGATGG 


GGGATTGTTA AGAGATGACT 


TAGGAATATA 


GTCATACTGA 


2640 


TCCCCATGTC 


TTACTATATA 


AGCATCACCT GTATCTCTGA 


CAATATCATT 


AGGGTTAAAG 


2700 


ACATATGTGG 


CTGCTAATTC 


ACCTGCCGAC AAGTCACTCT 


CAGGAATGAA 


ATGATAGTGA 


2760 


CCACCATGTG 


GTACTATAGT 


AGATTGAAAT AGAATATGAG 


CAAATTGATA 


AGGGGATTTT 


2820 


AAAGTAATTT 


CTAACAATGA 


TTTAGAAACT ATGATGTGCT 


ATTCTAAATT 


CAACTCACTA 


2880 


TATATAACCA 


TCATCGGTAG 


TATAACGTCC CTGTAATTTT 


GCTACAGATA 


CTTCTGCACT 


2940 
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AGCTCCTTTA 


TOGTCTTTAC 


CATGTTCTTG TTTTTCGCGA 


TTGATTTCAT CTTTTGTTCG 


3000 


TACATTTTCT 


GCATGAGCTT 


GATCTTTAAG 


GTAAACATAA 


TACTTTCCAT CTACCTTAAT 


3060 


AATATATCCT 


CCCTTAACCT 


AACTGACGAT 


ATCTPGATCT 


TTCGGCTGAT AGTTGGGGGC 


3120 


TTTCATTAAT 


AGCTCTTCAC 


TAAAGAGCGC 


ATCAAAAGGA 


ACTTTACCAT TATAGTAGTG 


3180 


ATAATGATCC 


CCATGAGAAG 


TTACATAACC 


TTGATCTGTA 


ATCTTAATAA CAATTTGTTT 


3240 


TGCTTGAATT 


CCTTCTTTTT 


GACTAACCTA GTCTGGAGTC 


AAATTTTCAG TCTTCTTAGT 


3300 


GTCTTTATTA 


CTGTTTACAT 


ATGAAACACG 


ATTTTTATCT 


GTATTGGCCT GTTAGCTATG 


3360 


TTGGTTCAGA 


GCATAAACAC 


ACAGACTTAA GGAAAGGATA 


ACAACAGATC CAGCTGCTAT 


3420 


ATATTTCTTT 


TTAAATrrCA 


TAATTACCTC 


ATTTCTATAA 


TTATTTATAT GATGTCTTCA 


3480 


TTATTAAATG ATTAAATAAA 


TTAATTAACC 


AATTAATTAA 


CTAGTAAATA TTCCACCTCT 


3540 


TTTTAAGTTG TATGTCAAGA 


AATTTTATAT 


ATTAATAATA 


AAATGAAATT CTCCCAAAGT 


3600 


CAGAGTTTTA 


TTTCTAACTT 


TTGAGAGAAC 


TTCATTTTTG 


ATTCAGACTT TTTCTACTGC 


3660 


TATTCCTTAC 


GCTATGAGAT 


CAGATAAATT 


CTTTTTTATC 


ACTTCTCCAC TTGGCAATCT 


3720 


TAATTCAATC 


GTTCCATCCA 


TATTGAATAT 


AACACTATCT 


AAGCCTAATC CGTAACTAGC 


3780 


TGTAAATTTT 


TCTAATTTTT 


CTTGTACAGG 


ATCTACTGCT 


GGAGCTTCCT CTAATGCTGG 


3840 


ATCTAACATA GGGTCACTCC 


CCACATTCCC 


TTCTGGATTC 


AACATTCCAT TATCCGTTGA 


3900 


GTTTPCTGCT TTTACAGGTT 


TTTCGTTTGG 


TGCCTCTGGT 


AAAGAATCTG CTGGTTTATT 


3960 


TTCTQTTGGT 


TGGTTCTCAA 


CTCTTCCAGT 


AGATACTTTT 


CCATTTTCAG ATGGTTTATT 


4020 


TTCACCATTT 


CCTTGAGGTC 


CTTCTCCTOT 


AAAATCTGCC 


ATATTCTTTT TAATCACTTC 


4060 


TCCCGATCGT AAATATAATT 


CAATTGTTCC 


GTCCATATTA 


AALAAbALAl TTTt tMiL. TIT 


4140 


CATCCCATAA CTTTCAGCAA 


ATTTTGCTAC 


TTTTTCTTGT 


ACAGGATCCA CTGTAGGAAC 


4200 


TTCTTCTAAC GTTCAATTAC 


TACTACTATT 


CCCACTTTCA 


GAAAGTTTTT CTTTTTCTAC 


4260 


CTTCTCACTA GTCTTTGGTT 


CTTCTACCTT 


TTCATCAAGT 


TTTAAGTTTT CTTGTGCTTT 


4320 


ATTCCTTTTA AATTGTCGTA 


GAATACTTGG 


TTTATCACTT 


TCATTTTCTT TTTCCAAGAT 


4380 


AGGTACTTCC 


ACAATATAAC 


TCGATTGATT 


GTCCAAATAA 


GCATTTGCCA TGAAGGTTAC 


4440 


AGGAATTTTA TTTCCGGCCG 


TTCTGGTTGT 


TCCTTGGTTT 


AATTPCGGAA TCGGTAATTT 


4500 


GATTTCACCA ACTTTATAGT 


TATTTTCTAA ATAAGCATTT 


CCATGAAATT CATCAAACAC 


4560 


TCTGACTAAA GCATCAGTTC 


CTTTAGGCAC 


TGCAAATTGA 


GCCTTCACTC TTAAATAAGT 


4620 


ATCCCCTGCA TGGAAAGGAT 


AGAAAATCGT 


TTCACTCGCC 


ATTTTGTAAG CTAAAGAGGT 


4680 
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TGGAACTGTA AATGTACCAT CATAACTTAC TTCTGGATAA TCTTTTGAAG CGATAGTATA 4740 

CTTAAATGTT TGTCCTGGTA AATAAGGTTG ATCTAATTCA AAGTTTGCAA TATTCCCTAC 4800 

TCCTTCTCCA AATACTTTAC CACATACTTT CTCCAATACT TTTCCATCTG GTGTTATTAA 4860 

TTTTACTAGC ATATTGATAC CTAATTPTTT CTCCAATTCA GGCGGAAAAC TAAAAGAAAC 4920 

GCGTTTTTGA CCATTGGCTA GAGTAAACTT TTGATTATTA AACGTACTAT TTTTTAACAA 4980 

ATTAACAACA TTCGTTAATT CTTCTCCAGT ATAAACTTTA TTCCCTTCTT TTTTAGCAAC 5040 

TCCTTCTTCG GGTTTAAACA GTTCATAGTT ACTGTGAGAA TGACCAATTC CAACCGGTTT 5100 

. ATGTTCATCA ATCGGATCTG CATGATGGTG ATCTCCATGC GCATAAATAA TCGCATTTTT 5160 

TTCTTTATTC ACGACAATAC TTTCACGTTT GACACCATAT TGTTTCATAA TGCCAGCAAT 5220 

TTTTTCTTCG ATTTTTTTAT CTAAAPCTTT CATTTCTTTG GCATTACTTG GATAATCCTG 5280 

TTCATGAGAT GACAAAGAAT CTAATCCATT ATGACTAGTT TTAACTTCCT CTAAATGTTT 5340 

TTGCGCASCT TAATTTGCTC TTCTGTCAAG TCCTTCTTGA AGAAATAATG ATTGTGGTCT 5400 

CCGTGACTCA TGACAAAACC TGATTCATCT TCAGCGATAA TACGATTAGC ATCAAATCCG 5460 

TATCCATCTT CTTCATGTTT CTCATGTGAA GTTCCTGGAT TGATTGGAAG AGATGGAGAA 5520 

GGTGTTGCTA GACTATTGTT TGGAAGACTC GGTTGCCCAA TTTGATTTGA TTTTGGAATG 5580 

TAATGGAAAT GATCACCATG TCTTACAATA TAAGCTGTAG CCGTTTCTTC AACGATATCT 5640 

TTTGGATTAA AAATATAACC ATCAGATGCT GAAGAGAGCT CCTTACTTGT CGTTAAAGAA 5700 

GAAGGATTGC TTGAAAGACT GCCTAGACTA GACACTACTT CATTAGGTTT TGCATTTGTA 5760 

GAAACTGTAG AACCAGTTCC ACTGATAGGC ACCATTCTGG CAATCTTTTC TTCTAAGGCA 5820 

GAAAGCTTGC TOTAAGOAAT AAAGTGOTAA TGGTCGCCAT GCGGAATCGC AACTCCATTT 5880 

GGTGTACGAC TGATAATCTT AGCAGGGTCA AAGACCAGGC CATCTGATTC ACTGTAACGT 5940 

TCGCCGCTAC GTGAATCATA GAGTTCCTTC AAAAGACTCT GGAGATTTTC AGATTTATTT 6000 

GCTCCCTTGC TAGTTCATCC TTTTGCTACA GATTGCGTGT TATTGTCACT AGCTGTTGAA 606 0 

GAATAGCTTA ACTGACTCGG TTGCATATTT TTTCCAGCCA GATGTGCTTT ACCTGCTGCT 6120 

AATTCACTAG CAGATAAATC GCTTTTGGGA ATGTAGTGAT AGTQACCTCC ATGAGGAACG 6180 

ATATAAGCAT TACCCGTATC TTCGATAATA TCAGCTGGAT TAAAGACATA ACCATCATTT 6240 

GTCGTATATC GTCCCTGAGA CCTTGCTACA GCAACATTAG AGTTAACCTT CTCATTATCT 6300 

TTGACATGTT CTTCTTTTTG ACGATTGATT TCATCTTTAG TTCGAACATT ATCAGCATGA 6360 

GCTGCATCTT TCAGGTACAC ATAATATTTT CCATCGACCT TGATGATATA ACCACCCTTG 6420 

ACTTCATTGA CAATATCAGC GTCTTTAAGT TGATAGTTTG GATCCTTCAT CAAGAGTTCT 6480 
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TCACTAAAGA GGGCATCATA AGGAACTTTC CCATTATAGT AATGATAGTG GTCACCGTGT 6540 

CACGTTACAT AGCCCTGATC TCTAATTTTG ATTACAATTT CCTCAGCCTG AATTCCTTCT 6600 

TTCTCCCTAA CCTCCTCTCC TCTCAAGTTT TCACTTTTCT CACTTCACTG CCTCCCATCC 6660 

ACATAAGAGA CACGATTATT GTCCTTATTT TCCTCCCAAC GATGCTCCTT TAGTGCATAG 6720 

GCACATAGAC TCAAGGATAC GATAACAGCT GATCCACCTG CTATATATTT TTTACTAAAT 6780 

TTCATAAATC CCTCATTTCA ATAAATGATG AAGTTTTTTC TCAACTTCTT TTACTTTATT 6840 

AAATAGTTTT CTAAACCCCG GGGTACC 6867 
(2) INFORMATION FOR SEQ ID NO: 193: 

(i) SEOOENCE CHARACTERISTICS: 

(A) LENGTH: 999 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDBDNBSS : double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE . DESCRIPTION: SEQ ID NO: 193: 



CGTTCTAAAA 


ATGCAGTACG 


TTTGATTGAG 


AAATCAGTTA 


AAGGTATGCT 


TCCACACAAT 


60 


ACACTTGGAC 


GCGCTCAAGG 


TATGAAGTTG 


AAAGTATTTG 


TTGGAGCTGA 


GCACACTCAC 


120 


GCTGCACAAC 


AACCAGAAGT 


TCTTGACATT 


TCAGGACTTA 


TCTAAGGAAA 


GGAACAATAA 


1BO 


AGTATGTCAC 


AAGCACAATA 


TGCAGGTACT 


GGACGTCGTA 


AAAACGCTGT 


TGCACGCGTT 


240 


CGCCTTGTTC 


CAGGAACTGG 


TAAAATCACT 


CTTAACAAAA 


AAGATGTTGA 


AGAGTACATC 


300 


CCACACGCTG 


ACCTTCGTCT 


TGTCATCAAC 


CAACCATTCG 


CAGTTACTTC 


AACTGTAGGT 


360 


TCATACGACG 


TTTTCGTTAA 


CGTTATAGGT 


CGTCGATACG 


CTCGTCAATC 


AGGACCTATC 


420 


CGTCACGGTA 


TCGCTCGTGC 


CCTTCTTCAA 


GTAGACCCAG 


ACTTCCGCGA 


TTCATTCAAA 


480 


CGCGCAGGAC 


TTCTTACACG 


TCACTCACGT 


AAAGTTGAAC 


GTAAGAAACC 


AGGTCTTAAG 


S40 


AAAGCTCGTA 


AAGCATCACA 


ATTTAGTAAA 


CGTTAATTCG 


AAAGAATTAC 


TATACTTATA 


600 


CAGACCACCT 


TTCGGGGTGT 


TCTTTTTTTA 


TACTTTCTTA 


CTAAATTGGT 


GCAATTGACA 


660 


CAGTTGTTGC 


GACTTTAGTC 


GCTTACAAAT 


GTGGCTGCAA 


CCTGACATGG 


TCAGTTGCCT 


720 


CAAAACGTTA 


ATCAATACGA 


TTATATCAAC 


GTTTCAAAGC 


ACTCAAGGGT 


TTACCCTATG 


780 


GGTGCTTTTT 


TCTATACTTT 


CTAAAAAAGT 


TTACCCTAAA 


ATTTGCCCTA 


AAATTACCCT 


840 


ACTTATTTTT 


AAGATGTTGG 


TAGGCAACTT 


GTCCAGCAGA 


TAATGGAACT 


ATGTTTGAAG 


900 


TATTAACATA 


AGTCTTAGTT 


GTAACGGTAT 


CGCTATGAGT 


TAATGCTTCA 


GAAATGGCTT 


960 
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GCTGCTGGAC TAGCTGCTTC ACCATTGTTT TTAGGATAGT CAGAAATATA CCTTAATTTC 
CCAGTCCATT TTTTATCAGG ATACACTTTA GAAGTAAAGC TTACTTCTTG ACCTACAGAA 
AOGTTGGCTA GATTGTACTC AGACAATTCT CCCTTGACTT GTAAATTTTC ATTGCTGACA 
ATATGAACCA TAACTTCACT CGCCCCTGTT CGAGATTTAG AAACATTGCT ATTGACTTCG 
ACCACAGTTC CCTCTAGGGT ACTGAGAACA GTTGTTGCAT CCAATTGACT TTGAGCCTTG 
CTTAATTGCG CCGCAGCATC TGCACGCGCA TCACGGGCAT CACCCAATTG AGCGTCAATA 
GAAGCAACAG AATTTCCAGC CACTCGACTT GCGCTTTGCA CCGTTGCATC TTCTCCTCCT 
ACTGGCGCTG GTAACTGTGG AGCCGQAGCT CAAGCCCCTT CATTTCCTCC TTGATTGACT 
TCATTGATAT GACGATCTGC CCTAGCTACT GCTCGACTAG CTGAATCATA GGCCGCCTGC 
GCTTCTGAAC TACTGTACTT GACTAAAGCC TGCCCTTCGC TGACCTTATC GCCCACAGAA 
ACAAGGATTT CATCTAAATC ACCCTTACTA GCATCAAAAT AAACATATTG TTCATTTTTT 
GCTGTTACTG TCCCTGACAA TAAAACAGAG GAGGCCACGC TTCCTTCCTT GGCAACAACA 
AGATGAGTAG GCTCATCTTT TAGAGCAGTC TGAGAAGGTT GTCTAAAGAG TAAAATCCCC 
CCAGCACCCA ATACAACTAC ACTCGCAGCA CCGATTGCTG CATACAGTTG CCACTTTTTA 
GCTTTACCAT TCTTTTTCTT CATAATGAAA CTCCTTTTCT TTTTTACAAT ACTTTGCTAT 
TATACCAAAT TTCCCTCCAG CAAACAATAC AGTTCAGGAT TAAACAATCG TTCGGAATTT 
TGCTTTTCGG 



(i) SEQUENCE CHAKACl'JiklSTICS: 

(A) LENGTH: 8195 base pairs 
IB) TYPE: cmcleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: EEQ 10 NO; 94: 
GAGAAAGCGC CCACGTTTCC CCCAAGGGAG AAAGGCGGAC AGGTATCCGC TAAGCGGCCA 
GGGTCGGAAC AGGAGAGCGC AACGAGGGAG CTTCCCACGC CGAAACCCCT GCTATCTTTA 
TAGTCCTGTC GGGTTTCGCC ACCTCTGACT TGAGCGTCGA TTTTTGTGAT GCTCGTCAGG 
GGGGCGGAGC CTATGOAAAA ACGCCAGCAA CGCGGCCTTT TTACGGTTCC TGGCCTTTTG 
CTGGCCTTTT GCTCACATGT TCTTTCCTCC GTTATCCCCT GATTCTGTGG ATAACCGTAT 
TACCCCCTTT GAGTGAGCTG ATACCGCTCG CCGCAGCCGA ACGACCGAGC GCAGCGAGTC 



9780 
9640 
9900 
9960 
10020 
10060 
10140 
10200 
10260 
10320 
10380 
10440 
10500 
10S60 
10620 
10660 
10690 



60 
120 
180 
240 
300 
360 




WEST 



WO 98/18931 



PCMJS97/19588 



728 

AGTGAGCGAG CAAGCGGAAC AGCGCCCAAT ACGCAAACCG CCTCTCCCCG CGCGTTGGCC 420 

GATTCATTAA TGCAGCTGGC ACGACACCTT TCCCGACTGG AAAGCGGGCA GTGAGCGCAA 480 

CGCAATTAAT CTGAGTTAGC TCACTCATTA GGCACCCCAG GCTTTACACT TTATGCTTCC 540 

GGCTCGTATG TTGTGTGGAA 7TGTGAGCGG ATAACAATTT CACACAGGAA ACAGCTATGA 600 

CaTGATTACG AATTCGAGCT CGGTACCCGG AAAATCCAGA AAATGCTTGA AAAAAATCCT 660 

AOAAGATCGT ATAATACTAA ATTGTAAGGG TTATCACATA TAACTCAAAA AAAGAAAGAA 720 

CAAAAGGAGA CTCAAACTAT GGCTTCTAAA GATTTCCACG TAGTGGCAGA AACAGGTATT 780 

CACCCACCTC CAGCAACATT OTTGGTACAA ACTGCTAGCA AATTTGCTTC AGATATCACT 640 

CTTCAGTACA AACOTAAATC AGTTAACCTT AAATCAATTA TGGGTGTTAT GAGTCTTGGT 900 

GTTGGCCAAG GTGCTGACGT AACTATCTCA CCTGAACGTG CACATGCAGA TGACGCTATC 960 

GCTGCAATCT CAGAAACAAT GGAAAAAGAA GGATTGGCAT AAGGGAAATG ACAGAAATGC X020 

TTAAAGGAAT CCCAGCATCT GACGCTGTTG CAGTTGCAAA AGCATATCTA CTCGTTCAGC 10 SO 

CGGATTTGTC ATTTGAGACT ATTACAGTCG AAGATACAAA CGCAGAAGAA GCTCGCCTTG 1140 

ATGCCCCTCT ACAGGCATCA CAAGACGACC TTTCTGTTAT TCGCGAGAAA GCAGTAGGTA 1200 

CGCTCGQTGA AGAAGCAGCT CAAGTTTTTG ATGCTCACTT AATGGTTCTT GCTGACCCAG 1260 

AAATGATCAG CCAAATCAAG GAAACTATCC GTCCGAAGAA AGTGAATGCA GAAGCAGGTC 1320 

TGAAAGAAGT TACAGATATG TTTATCACTA TCTTTGAAGG CATGCAAGAC AACCCATACA 13 BO 

TGCAAGAACG CGCAGcGGAT wTCCCCGACG TGACAAAACG TGTATTGGCA AACCTTCTTG 1440 

CTAAAAAATT CCCAAACCCA GCTTCTATCA ATGAAGAAGT CATTCTGATT GCGCATGACT 3500 

TGACTCCTTC AGATACAGCT CAATTGGACA AAAACTTTGT AAAAGCTTTT GTAACCAACA 1560 

TTGGTGGACG TACAAGCCAC TCAGCTATCA TGGCACGTAC ACTTGAAATT GCTCCTCTAT 1620 

TAGCTACAAA TAACATCACT GAAATCGTTA AAGACGGTGA CATCCTTGCT GTTAACGGGA 1680 

TCACTGGAGA AGTGATTATC AACCCAACAG ATGAACAAGC GGCACAATTT AAAGCACCTG 1740 

GTGAAGCCTA TGCGAAACAA AAAGCTGAAT GGGCACTTTT GAAAGATGCT CAAACAGTGA 1800 

CTGCTGACGG TAAACACTTC GAGTTGGCTG CTAATATCGG TACTCCAAAA GACGTTGAAG 1860 

GTGTTAACAA CAACGGTGCA GAAGCTGTTG GACTTTACCG TACAGAGTTC TTGTACATGG 1920 

ATTCTCAAGA CTTCCCAACT GAAGATGAGC AGTATGAAGC ATACAAGGCT GTTCTTGAAG 1980 

CAATGAACGG TAAACCTGTT GTCGTTCGTA CAATGGATAT CGGTGGAGAT AAGGAACTTC 2040 

CTTACTTCCA TATGCCTCAC GAAATGAACC CATTCCTTGG ATTOCGTGCT CTTCGTATCT 2100 

CTATCTCTCA CACTGCACAT GCTATGTTCC GCACACAAAT CCGTGCTCTT CTTCGTGCGT 2160 
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CTGPTCACGG 


TCAATTGCGT 


ATCATGTTCC CAATGGTTGC 


GCTCTTGAAA GAATTCCGTG 


2220 


CAGCGAAAGC 


AGTCTTTGAT 


GAAGAAAAAG CAAACCTTCT 


TGCTGAAGGT GTTGCAGTTG 


2260 


CGGATAACAT 


CCAAGTTGGT 


ATCATGATCG AGATTCCTGC 


AGCGGCTATG CTTGCAGACC 


2340 


AATTTGCTAA 


AGAAGTTGAC 


TTCTTCTCAA TTGGTACAAA 


CGACTTGATC CAATATACAA 


2400 


TGGCAGCAGA 


CCOTATGAAC 


GAACAACTTT CATACCTTTA 


CCAACCATAC AACCCATCAA 


2460 


TCCTACGCTT 


GATTAACAAT 


GTGATCAAAG CAGCTCAOGC 


TGAAGGTAAA TGGGCTGGTA 


2520 


TQTGTGGTGA 


GATGGCTGGT 


GACCAACAAG CTGTTCCACT 


TCTTGTCGGA ATGGGCTTGG 


25B0 


ATGACTTCTC 


TATGTCAGCA 


ACATCTCTAC TTCGTACACG 


CAGCTTGATG AAGAAACTCG 


2640 


ACACAGCTAA 


GATGGAAGAG 


TACGCAAACC GTGCCCTTAC 


AGAATGCTCA ACAATGGAAG 


2700 


AAGTTCTTGA 


ACTTCAAAAA 


GAATACGTTA ATTTTGATTA 


ATCGAAAAGT CCCTGCAACT 


2760 


CAGTTACAGG 


GATTTTTTTG 


ATATTTTAAA AAGAATTTTC 


AAGAAAATCT TTCTTATAGA 


2820 


AAGTCCAACC 


TTGAAAAAGT 


AGTGGTCAGA ACAAAAAATA 


CTTAAATGGT TCATAAAATT 


2980 


CTTGACAAGT 


TGGATATTTA 


GGAGTAAACT ATTAACCAGT 


TAAGTAATAG AGAGGAGTTT 


2940 


CTCCAATTTA 


CAAATCAATT 


GCAACTACAA ATATCAAATA 


GAAAGAGAGT TTCGATGAAA 


3000 


ATTAATAAGA 


AATACCTTGT 


TGGTTCTGCG GCACTTTGAT 


TTTAAGTCTT TGTTCTTACC 


3060 


AGTTGGGACT 


GTATCAAGCT 


AGAACGGTTA AGGAAAATAA 


TCGTGTTTCC TATATAGATG 


3120 


GAAAACAAGC 


GACGCAAAAA ACGGAGAATT TGACTCCTGA 


TGAGGTTAGC AAGCGTGAAG 


3180 


GAATCAATGC 


TGAGCAAATC 


GTCATCAAGA TAACAGACCA 


AGGCTATCTC ACTTCACATG 


3240 


GCGACCACTA 


TCATTATTAC 


AATGGTAAGG TTCCTTATGA 


CGCTATCATC AGTGAAGAAT 


3300 


TACTCATGAA 


AGATCCAAAC 


TATAAGCTAA AAGATGAGGA 


TATTGTTAAT GAGGTCAAGG 


3360 


GTGGATATGT 


TATCAAGGTA 


GATCGAAAAT ACTATGTTTA 


CCTTAAGGAT GCTGCCCACG 


3420 


CGCATAAOGT 


CCGTACAAAA 


GAGGAAATCA ATCGACAAAA 


ACAAGAGCAT ACTCAACATC 


3400 


GTGAAGGTGG 


AACTCCAAGA 


AACGATGGTG CTGTTCCCTT 


GCCACOTTCG CAAGGACCCT 


3540 


ATACTACAGA 


TGATGOTTAT 


ATCTTTAATG CTTCTGATAT 


CATAGAGGAT ACTGGTGATG 


3600 


CTTATATCGT 


TCCTCATGGA GATCATTACC ATTACATTCC 


TAAGAATCAG TTATCAGCTA 


3660 


GCGAGTTCCC 


TGCTGCAGAA 


GCCTTCCTAT CTGGTCGAGG 


AAATCTGTCA AATTCAAGAA 


3720 


CCTATCGCCG 


ACAAAATAGC 


GATAACACTT CAAGAACAAA 


CTGGGTACCT TCTGTAAGCA 


3780 


ATCCAGGAAC 


TACAAATACT 


AACACAAGCA ACAACAGCAA 


CACTAACAGT CAAGCAAGTC 


3940 


AAAQTAATGA 


CATTGATAGT 


CTCTTGAAAC AGCTCTACAA 


ACTCCCTTTG AGTCAACGAC 


3900 
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CTGTTCACCG 


TCAATTGCGT ATCATGTTCC CAATGGTTGC GCTCTTCAAA 


GAATTCCGTG 


2320 


CAGCCAAACC 


AGTCTTTGAT GAAGAAAAAG CAAACCTTCT TGCTGAAGCT 


GTTGCAGTTG 


22B0 


CGGATAACAT 


CCAAGTTGGT ATCATGATCG AGATTCCTGC AGCGGCTATG 


CTTGCAGACC 


2340 


AATTTGCTAA 


AGAAGTTGAC TTCTTCTCAA TTGGTACAAA CGACTTGATC 


CAATATACAA 


2400 


TGGCAGCAGA 


CCGTATGAAC GAACAAGTTT CATACCTTTA CCAACCATAC 


AACCCATCAA 


2460 


TCCTACGCTT 


GATTAACAAT GTGATCAAAG CAGCTCACCC TGAAGGTAAA 


TGGGCTGGTA 


2520 


TGTGTCGTGA 


GATGGCTGGT GACCAACAAG CTGTTCCACT TCTTGTCGGA 


ATGGGCTTGG 


2580 


ATGAGTTCTC 


TATGTCAGCA ACATCTGTAC TTCGTACACG CAGCTTGATG 


AAGAAACTCG 


2640 


ACACAGCTAA 


GATGGAAGAG TACGCAAACC GTGCCCTTAC AGAATGCTCA 


ACAATGGAAG 


2700 


AAGTTCTTGA 


ACTTCAAAAA GAATACGTTA ATTTTGATTA ATCGAAAAGT 


CCCTGCAACT 


2760 


CAGTTACAGO 


GATTOTTTTG ATATTTTAAA AAGAATTTTC AAGAAAATCT 


TTCTTATAGA 


2620 


AAGTCCAACC 


TTGAAAAAGT AGTGGTCAGA ACAAAAAATA CTTAAATGGT 


TCATAAAATT 


2880 


CTTGACAAGT 


TGGATATTTA CCAGTAAACT ATTAACCAGT TAAGTAATAG 


AGAGGACTTT 


2940 


CTGCAATTTA 


GAAATGAATT GCAACTAGAA ATATCAAATA GAAAGAGAGT 


TTCGATGAAA 


3000 


ATTAATAAGA 


AATACCTTGT TGGTTCTGCG GCACTTTGAT TTTAAGTGTT 


TGTTCTTACG 


3060 


AGTTGGGACT 


GTATCAAGCT AGAACGGTTA AGGAAAATAA TCGTGTT/PCC 


TATATAGATG 


3120 


GAAAACAAGC 


GACGCAAAAA ACGGAGAATT TGACTCCTGA TGAGGTTAGC 


AAGCGTGAAG 


3180 


GAATCAATGC 


TGAGCAAATC GTCATCAAGA TAACAGACCA AGGCTATCTC 


ACTTCACATG 


3240 


GCGACCACTA 


TCATTATTAC AATGGTAAGG TTCCTTATGA CGCTATCATC 


AGTGAAGAAT 


3300 


TACTCATGAA 


AGATCCAAAC TATAAGCTAA AAGATGAGGA TATTGTTAAT 


GAGGTCAAGG 


3360 


GTGGATAPGT 


TATCAAGGTA GATGGAAAAT ACTATGTTTA CCTTAAGGAT 


GCTGCCCACG 


3420 


CGCATAACGT 


CCGTACAAAA GAGGAAATCA ATCGACAAAA ACAAGAGCAT 


AGTCAACATC 


3480 


GTGAAGGTGG 


AACTCCAAGA AACGATGGTG CTGTTGCCTT QGCACOTTCO 


CAAGGACGCT 


3540 


ATACTACAGA 


TGATCQTTAT ATCTTTAATC CTTCTGATAT CATAGACGAT 


ACTGGTGATG 


3600 


CTTATATCGT 


TCCTCATCCA CATC ATT ACC ATTACATTCC TAAGAATCAG 


TTATCAGCTA 


3660 


GCGAGTTGGC 


TGCTCCAGAA GCCTTCCTAT CTCGTCCAGG AAATCTGTCA 


AATTCAAGAA 


3720 


CCTATCGCCG 


ACAAAATAGC GATAACACTT CAAGAACAAA CTGGGTACCT 


TCTGTAAGCA 


3780 


ATCCAGGAAC 


TACAAATACT AACACAAGCA ACAACAGCAA CACTAACAGT 


CAAGCAAGTC 


3840 


AAAQTAATGA 


CATTGATAGT CTCTTGAAAC AGCTCTACAA ACTGCCTTTG 


AGTCAACGAC 


3900 
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ATGTAGAATC TGATGGCCTT GTCTTTGATC 


730 
CAGCACAAAT 


CACAAGTCGA 


AC AG CT AG AG 


3960 


CTGTTGCAGT GCCACAOGGA GATCATTACC 


ACTTCATCCC 


TTACTCTCAA ATGTCTGAAT 


4020 


TGGAAGAACG AATCGCTCGT ATTATTCCCC 


TTCGTTATCC 


TTCAAACCAT 


TCGCTACCAG 


4080 


ATTCAAGGCC AGAACAACCA AGTCCACAAC 


CGACTCCGGA 


ACCTAGTCCA 


GGCCCGCAAC 


4140 


CTGCACCAAA TCTTAAAATA GACTCAAATT 


CTTCTTTGGT 


TAGTCAGCTG 


GTACGAAAAG 


4200 


TTGGGGAAGG ATATCTATTC GAAGAAAAGG 


GCATCTCTCG 


TTATGTCTTT 


GCGAAAGATT 


4260 


TACCATCTGA AACTGTTAAA AATCTTGAAA 


GCAACTTATC 


AAAACAAGAG 


AGTGTTTCAC 


4320 


ACACTTTAAC TGCTAAAAAA GAAAATGTTG 


CTCCTCCTGA 


CCAAGAATTT 


TATGATAAAG 


4380 


CATATAATCT GTTAACTGAG GCTCATAAAG 


CCTTGTTTGA 


AAATAAGGGT 


CGTAATTCTG 


4440 


ATTTCCAAGC CTTAGACAAA TTATTAGAAC 


GCTTGAATGA 


TGAATCGACT 


AATAAAGAAA 


4S00 


AATTGGTAGA TGATTTATTG GCATTCCTAG 


CACCAATTAC 


CCATCCAGAG 


CGACTTGGCA 


4560 


AACCAAATTC TCAAATTGAG TATACTGAAG 


ACGAAGTTCG 


TATTGCTCAA 


TTAGCTGATA 


4620 


AGTATACAAC GTCAGATGOT TACATTTTTG 


ATGAACATGA 


TATAATCAGT 


GATGAAGGAG 


4680 


ATGCATATGT AAOGCCTCAT ATGGGCCATA 


GTCACTGGAT 


TGGAAAAGAT 


AGCCTTTCTG 


4740 


ATAACCAAAA AGTTGCACCT CAAGCCTATA 


CTAAAGAAAA 


AGGTATCCTA 


CCTCCATCTC 


4800 


CAGACGCAGA TGTTAAAGCA AATCCAACTG 


GACATAGTGC 


ACCAGCTATT 


TACAATCGTG 


4860 


TGAAACGGGA AAAACGAATT CCACTCGTTC 


GACTTCCATA 


TATGGTTGAG 


CATACAGTTG 


4920 


AGCTTAAAAA CGGTAATTTG ATTATTCCTC 


ATAAGGATCA 


TTACCATAAT 


ATTAAATTTG 


4980 


CTTGGTTTGA TGATCACACA TACAAAGCTC 


CAAATGGCTA 


TAOCTTGGAA GATTTGTTTG 


5040 


CGACGATTAA GTACTACGTA GAACACCCTG 


ACGAACGTCC 


ACATTCTAAT 


GATGGATGGG 


5100 


GCAATGCCAG tgagcatgtg ttaggcaaga 


AAGACCACAG 


TGAAGATCCA AATAAGAACT 


5160 


tcaaagcgga tgaagagcca gtagaggaaa 




GCCAGAAGTC 


CCTCAAGTAG 


3*1 U 


agactgaaaa agtagaagcc caactcaaag 


AAGCAGAAGT 


TTTGCTTGCG 


AAAGTAACGG 


5280 


attctagtct caaagccaat gcaacagaaa 


CTCTAGCTGG 


TTTACGAAAT 


AATTTGACTC 


5340 


ttcaaattat ggataacaat agtatcatgg 


CAGAACCAGA 


AAAATTACTT 


GCGTTCTTAA 


5400 


aaggaagtaa tccttcatct gtaagtaagg 


AAAAAATAAA 


CTAATGAAAA 


ATGAAAGTCT 


5460 


cgataaagaq gctttcattt ttattatgta 


TATATGTAAA 


ATTCTTGACA AGCAATATTA 


5520 


aaaagagtaa actattaac? agttaattaa 


CCGGTTTATT 


ACTTTATAGT 


GAATCAAATA 


5580 


TACTTAAGAA AAGAGGAAAG AATGAAAATT 


AATAAAAAAT 


ATCTAGCAGG 


TTCAGTGGCA 


5640 


GTCCTTGCCC TAAGTOTTI^G TTCCTATGAA 


CTTGGTCCTC 


ACCAAGCTGG 


TCACGTTAAG 


5700 
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AAAGAGTCTA ATCGACTTkC TTATATAGAT GCTGATCAGC CTGGTCAAAA 


CCCAGAAAAC 


5760 


TTGACACCAG ATGAAGTCAG TAAGAGGGAG GGCATCAACG CCGAACAAAT 


CGTCATCAAG 


5820 


ATTACGGATC AAGGTTATGT GACCTCTCAT GGAGACCATT ATCATTACTA 


TAATGGCAAG 


5680 


GTCCCTTATG ATGCCATCAT CAGTGAAGAG CTCCTCATGA AAGATCCGAA 


TTATCAGTTG 


5940 


AAGGATTCAG ACATTCTCAA TGAAATCAAG GGTGGTTATG TTATCAAGGT 


AGATCGAAAA 


6000 


TACTATGTTT ACCTTAAGGA TGCAGCTCAT COCGATAATA TTCGGACAAA 


AGAAGAGATT 


6060 


AAACGTCAGA AGCAGGAACA CACTCATAAT CACGGGGGTG GTTCTAACGA 


TCAAGCAGTA 


6120 


GTTGCAGCCA CAGCCCAAGC ACGCTATACA ACGGATGATG GTTATATCTT 


CAATGCATCT 


6180 


GATATCATTG AGGACACGGG TGATGCTTAT ATCGTTCCTC ACGGCGACCA 


TTACCATTAC 


6240 


ATTCCTAAGA ATGAGTTATC AGCTAGCGAG TTAGCTGCTG CAGAAGCCTA 


TTGQAATGGG 


6300 


AAGCAGGGAT CTCGTCCTTC TTCAAGTTCT AGTTATAATG CAAATCCACC 


TCAACCAAGA 


6360 


TTGTCAGAGA ACCACAATCT GACTGTCACT CCAACTTATC ATCAAAATCA 


AGGGGAAAAC 


6420 


ATTTCAAGCC TTTTACCTCA ATTCTATGCT AAACCCTTAT CAGAACGCCA 


TCTGGAATCT 


6480 


GATGGCCTTA TTTTCCACCC AGCGCAAATC ACAACTCGAA CCCCCAGACG 


TGTAGCTGTC 


6540 


CCTCATGGTA ACCATTACCA CTTTATCCCT TATGAACAAA TGTCTGAATT 


GGAAAAACGA 


6600 


ATTGCTCGTA TTATTCCCCT TCGTTATCGT TCAAACCATT GGGTACCAGA 


TTCAAGACCA 


6660 


GAACAACCAA GTCCACAATC GACTCCGGAA CCTAGTCCAA GTCCGCAACC 


TGCACCAAAT 


6720 


CCTCAACCAG CTCCAAGCAA TCCAATTGAT GAGAAATTGG TCAAAGAAGC 


TGTTCGAAAA 


6780 


GTAGGCGATG GTTATGTCTT TGAGGAGAAT GGAGTTTCTC GTTATATCCC 


AGCCAAGGAT 


6840 


CTTTCAGCAG AAACAOCACC* AGfiPATTfiAT AfiPAiAfrv^n f*n*mnh.w h 


AAGTTTATCT 


6900 


CATAAGCTAG GAGCTAAGAA AACTGACCTC CCATCTAGTG ATCGACAATT 


TTACAATAAG 


6960 


GCTTATGACT TACTAGCAAG AATTCACCAA GATTTACTTG ATAATAAAGG 


TCGACAAGTT 


7020 


GATTTTGAGG CTTTGGATAA CCTGTTGGAA CGACTCAAGG ATGTCyCAAG 


TGATAAAGTC 


7080 


AAGTTAGTGG ATGATATTCT TGCCTTCTTA GCTCCGATTC GTCATCCACA 


ACGTTTAGGA 


7140 


AAACCAAATG CGCAAATTAC CTACACTGAT GATGAOATTC AAGTAGCCAA 


CTTCGCAGGC 


7200 


AAGTACACAA CAGAAGACGG TTATATCTTT GATCCTCGTG ATATAACCAG 


TGATGAGGGG 


7260 


CATGCCTATG TAACTCCACA TATGACCCAT AGCCACTGGA TTAAAAAAGA 


TAGTTTGTCT 


7320 


GAAGCTGAGA GAGCGGCACC CCAGGCTTAT GCTAAAGAGA AAGGTTTGAC 


CCCTCCTTCG 


7360 


ACAGACCATC AGGATTCAGG AAATACTGAG GCAAAAGGAG CAGAACCTAT 


CTACAACCGC 


7440 
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CTGAAAGCAG CTAAGAAGGT GCCACTTGAT CGTATGCCTT ACAATCTTCA ATATACTGTA 7500 

CAAGTCAAAA ACGGTAGTTT AATCATACCT CATTATGACC ATTACCATAA CATCAAATTT 7560 

GAGTGGTTTG ACGAAGGCCT TTATGAGCCA CCTAACCCCT ATACTCTTGA CCATCTTTTG 7620 

GCGACTGTCA AGTACTATGT CGAACATCCA AACGAACGTC CGCATTCAGA TAATCGTTTT 7680 

GGTAACGCTA GCGACCATGT TCGTAAAAAT AACGTACACC AAGACAGTAA ACCTGATGAA 7740 

GATAAGGAAC ATGATGAAGT AAGTGAGCCA ACTCACCCTG AATCTGATGA AAAACAGAAT 7800 

CACGCTGGTT TAAATCCTTC AGCAGATAAT CTTTATAAAC CAAGCACTGA TACGGAAGAG 7860 

ACAGAGGAAG AAGCTGAAGA TACCACAGAT GAGGCTGAAA TTCCTCAAGT AGAGAATTCT 7920 

CTTATTAACC CTAACATAGC ACATGCGGAG GCCTTGCTAG AAAAAGTAAC AGATCCTAGT 7980 

ATTAGACAAA ATGCTATGGA GACATTGACT GGTCTAAAAA CTACTCTTCT TCTCGGAACG 8040 

AAAGATAATA ACACTATTTC AGCAGAAGTA GATAGTCTCT TGGCTTTGTT AAAAGAAAGT 8100 

CAACCGGCTC CTATACAGTA GTAAAATGAA TGGAGCATAT TTTATGGAGA AGTAACCTTT 6160 

CGTGTTACTT CTCTTTTTTA GAAAAACGTA ACAGA 8195 
(2) INFORKAT ION FOR SEQ ID NO: 95: 

U) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2004 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNBSS : double 
(D) TOPOLOGY : linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 95: 



TTTACTAAAA 


GGAAAAAAGA 


ACTGATTTCT 


CAGTCCTTCA TTAATCTTAT TCCACACTAA 


60 


ATAGGTATGG 


OTAAACAGGT 


TGTTGACCTT 


GGTGAATCTC GACTTCAACG TCTTCGAATT 


120 


CTTCTACGAT 


TTCTTCAGCG 


ATTTCATTGG 


CAACTTCTTC GCTTCCGTCT TCACCTACAT 


180 


AGAAGGTTAC 


GATTTCACTC 


TCTTCATCCA 


ACATATGTTT CAAGCTTTCA GTCAATGITT 


240 


GGTGCATATC 


AGGGTTTGAC 


ACAAGAATTT 


TTCCATCCAC CATACCTAAA TTATCGTTTT 


300 


CATGGATTTC 


TAAGCCATCG 


ATCGTTGTAT 


CACGCACGGC TGTTGTGACG CTTCCGCTAA 


360 


CGACATCGCT 


AAGAGCAGCT 


GTCATACGCT 


CTTCGTTTTC TTCAATGGAC TTGCTTGGAT 


420 


CAAAGGCAAG 


AAGACTTGTC 


ATACCTTGAG 


GAAGAGTGCG AGCCTCTACC ACTACCGCTG 


480 


GTTGCTCCAA 


AACTTCTGCC 


GCAGATTGAG 


CTGCCATGAA GATGTTCTTG TTGTTTGGCA 


540 


AGAAGATCA*? 


GTTACGGGCA 

i 


TTAACCTGTT 


CAACAGCCTT GATAAAGTCT TCTGTTGAAG 


600 


GGTTCATGGT 


TTGACCGCCT 


TCGATAACAT 


AATCCACGCC TTGAGAACAG AAGATATCTG 


660 
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43. Lysates from frozen brain human tissue were prepared 
as in (24). Radioactive RT-PCR was performed in a total 
volume of 50 id containing cDNA synthesized from 
0.25 M-g RNA, 20 mM Tris-HCL pH 8.4. 50 mM KCL 1.5 
mM MgCly 02 mM dNTPs, 17 >«.Ci [a-^PjCTP, and 0.4 
M-M of the primers as follows: h8DNF5\ S'-AGCCA- 
CAATCCCAACCACGA-3'; hBDNF3', 5'-GCACACCT- 
CGCTAGCCCAAC-3'. PCR amplification was carried 
out for 30 cycles. Each cycle consisted of the following 
steps: 94°C for 30 s. 57 e C for 30 s and 72°C for 30 s. The 
same amount of each cONA was also amplified, inde- 
pendently, with SNAP-25 (synaptosomal associated 
protein 25, a presynaptic membrane-associate protein 
localized in grown cones, axons and presynaptic termi- 
nals) specific primers. SNAP-25 5', 5 ' -CAAATG ATGC- 
CCGAGAAAAT-3'; SNAP25 3', 5'-GGAATCAGCCT- 
TCTCCATGA-3'. PCR products were separated by non- 
denaturing 8% polyacrylamide gel electrophoresis and 
visualized by autoradiography. BDNF levels were quan- 
tified and normalized relative to SNAP-25 levels. 

44. V. O. Ona, et ai Nature 399, 263 (1999). 

45. Total cellular lysates from conditionally immortalized 
CNS celts (/ 3, 27) were obtained in a buffer contain- 
ing Trfs 50 mM pH 7.4, 5 mM NaG, Triton X100 1%, 



1 mM DTT, 15 mM EGTA supplemented with 1:100 
of Protease Inhibitor Cocktail (Sigma). Immu no pre- 
cipitates were obtained by incubating the total cel- 
lular iysate (from 4 x 10 s ceils) with Mab2166 
(1:1000) following conventional immunoprecipita- 
tion protocols and loaded. The blotted proteins were 
exposed to antibody to Htt Mab2166 (dilution 
1:5000; Chemicon, Temecula. CA). RNA was reverse- 
transcribed into single -stranded cDNA using Super- 
script If RNase H~ Reverse Transcriptase (Life Tech- 
nologies) as described by the manufacturer. PCR was 
performed in a total volume of 50 ^.l containing 1 ^g 
cDNA, 20 mM Tris-HCl pH 8.4, 50 mM KCL 1.5 mM 
MgCl^ 0.2 mM dNTPs. 5% dimethyl sulfoxide 
(DMSO), 0.4 nM of Htt-specifk primers (S'-CGAC- 
CCTGG AAA AG CTG ATG AA- 3 ' and S'-CACACG- 
GTCT TTCT TGGTAGCTGA-3'), 2 U Taq polymerase 
(Life Technologies). Amplification was carried out for 
25 cycles. Each cycle consisted of the following steps: 
94*C for 30 s. 56*C for 30 s, and 72°C for 60 s. PCR 
products were separated by electrophoresis on 2% 
agarose gel and visualized by staining with ethidium 
bromide. 
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Complete Genome Sequence of 
a Virulent Isolate of 
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The 2, 160,837- base pair genome sequence of an isolate of Streptococcus 
pneumoniae, a Gram-positive pathogen that causes pneumonia, bacteremia, 
meningitis, and otitis media, contains 2236 predicted coding regions; of these, 
1440 (64%) were assigned a biological role. Approximately 5% of the genome 
is composed of insertion sequences that may contribute to genome rearrange- 
ments through uptake of foreign DNA. Extracellular enzyme systems for the 
metabolism of polysaccharides and hexosamines provide a substantial source 
of carbon and nitrogen for S. pneumoniae and also damage host tissues and 
facilitate colonization. A motif identified within the signal peptide of proteins 
is potentially involved in targeting these proteins to the celt surface of low- 
guanine/cytosine (CC) Cram-positive species. Several surf ace-exposed proteins 
that may serve as potential vaccine candidates were identified. Comparative 
genome hybridization with DNA arrays revealed strain differences in S. pneu- 
moniae that could contribute to differences in virulence and antigenicity. 



Streptococcus pneumoniae (pneumococcus) has 
played a pivotal role in the fields of genetics and 
microbiology. The pioneering studies of Avery, 
MacLeod, and McCarty in 1944 {/) demonstrat- 
ed that DNA is the true hereditary material by 
transfoniiing a noncapsulated, avirulent S. pneu- 



moniae strain with DNA from a capsulated vir- 
ulent strain. This woik highlighted the impor- 
tance of the bacterial polysaccharide capsule as 
a key pathogenicity factor. 

As a human pathogen, S. pneumoniae is the 
most common bacterial cause of acute respira- 



tory infection and otitis media and is estimated 
to result in over 3 million deaths in children 
every year worldwide from pneumonia, bacte- 
remia, or meningitis (2). Even more deaths oc- 
cur among elderly people, among whom S. 
pneumoniae is the leading cause of community- 
acquired pneumonia and meningitis (3). Since 
1990, the number of penicillin-resistant strains 
has increased from I to 5% to 25 to 80% of 
isolates, and many strains are now resistant to 
commonly prescribed antibiotics such as peni- 
cillin, macrolides, and fluoroquinolones (4). 

The complete genome sequence of a capsu- 
lar serotype 4 isolate of S. pneumoniae [desig- 
nated T1GR4 (5); T1GR indicates The Institute 
for Genomic Research] was determined by the 
random shotgun sequencing strategy (6) (Gen- 
Bank accession number AE005672; see www. 
tigr.org/tigr-scripts/CMR2/CMRHomePage. 
spl). This clinical isolate was taken from the 
blood of a 30-year-old male patient in 
Kongsvinger, Norway, and is highly invasive 
and virulent in a mouse model of infection (7). 

The genome consists of a single circular 
chromosome of 2,160,837 base pairs (bp) with a 
G + C content of 39.7%. Base pair 1 of the 
chromosome was assigned within the putative 
origin of replication. Of the 2236 genes identi- 
fied (#), 1155 are located on the right of the 
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origin of replication, and 916 (79%) of these are 
transcribed in the same direction as DNA repli- 
cation; similarly, 1081 genes are on the left of 
the origin of replication, and 857 (79%) of them 



transcribed in the same direction [Fig. 1 and 
Web fig. 1 (P)]. This type of gene orientation 
bias appears to be a common feature of low-GC 
Gram-positive organisms (10). 



Although the S. pneumoniae genome was 
reported to contain six ribosomal RNA (rRNA) 
operons (//), the TIGR4 isolate contains only 
four rRNA operons. Only 12 of the 58 tRNAs 
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Fig. 1. Circular representation of the 5. pneumoniae TIGR4 genome 
and comparative genome hybridizations using microarrays. Compar- 
ative genome hybridizations are used to identify genomic differences 
between the TICR4 isolate and strains R6 and D39, using a preliminary 
microarray. Results are displayed on the third and fourth circles. 
Genes were classified in four groups: (i) gene not present on the array 
and not analyzed (black) (394 genes, 17% of total); (ii) ortholog 
present in the test strain (green); (iii) orthotog absent in the test 
strain (red); and (iv) ambiguous result (blue). The Cy3/Cy5 ratio 
(TIGR4 signal/test strain) cutoffs for each category were determined 
subjectively as Cy3/Cy5 = 1.0 to 3.0, green; 3.0 to 10.0, blue; and 
> 10.0, red. There were a number of loci for which hybridization ratios 
fell between what is expected for gene presence or absence (Cy3/Cy5 
ratios between 3.0 to 10.0). Ambiguous results (blue bars) can be 
explained in at least two ways: (i) The gene may be highly diverged in 
R6 and/or D39 relative to the TIGR4 isolate, (ii) Alternatively, the 
gene may be absent in R6 and/or D39 but still be able to produce a 
hybridization signal, because the TIGR4 isolate gene is a member of a 



1,000,000 



paralogous gene family or a repetitive element. The outer circle shows 
predicted coding regions on the plus strand, color-coded by role categories: 
salmon, amino acid biosynthesis; light blue, biosynthesis of cofactors and 
prosthetic groups and carriers; light green, cell envelope; red, cellular pro- 
cesses; brown, central intermediary metabolism; yellow, DNA metabolism; 
green, energy metabolism; purple, fatty acid and phospholipid metabolism; 
pink, protein fate/synthesis; orange, purines, pyrimidines, nucleosides, and 
nucleotides; blue, regulatory functions; grey, transcription; teal transport 
and binding proteins; black, hypothetical and conserved hypothetical pro- 
teins. The second circle shows predicted coding regions on the minus 
strand, color-coded by role categories. The third circle shows strain R6 
genes. The fourth circle shows strain D39 genes. The fifth circle shows an 
atypical nucleotide composition curve; the nine gene clusters that are 
absent in strains R6 and D39 are indicated by red bullets. The sixth circle 
shows the GC-skew curve. The seventh circle shows IS elements. The 
eighth circle shows RUP elements. The ninth circle shows BOX elements. 
The tenth circle shows rRNAs in blue, tRNAs in green, and structural 
RNAs in red. 
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are not found adjacent to a rRNA operon [Fig. I 
and Web fig. I (9)\ Three structural RNAs were 
identified: a tRNA-like/mRNA-like (tm) RNA 
(www.mdiana.edu/~tona/), a signal recogni- 
tion particle R.NA (72), and a ribonuclease P 
RNA (13). 

Biological roles were assigned to 1440 
(64%) of the predicted proteins according to the 
classification scheme adapted from Riley (J 4). 
Another 359 (16%) predicted proteins matched 
proteins of unknown function, and the remain- 
ing 437 (20%) had no database match. A total of 
260 paralogous protein families were identified 
in the T1GR4 isolate (8), containing 823 predict- 
ed proteins (37% of the total). 

Comparative genome analysis identified 258 
genes in S. pneumoniae [Web table 1 (9)] that 
probably were duplicated after the divergence of 
this species from other evolutionary lineages for 
which complete genomes are available (8). Such 
lineage-specific gene duplications may reveal 
species-specific adaptations, because gene du- 
plication is frequently accompanied by function- 
al diversification and divergence, 'these dupli- 
cations in S. pneumoniae include bacteriocin 
genes, choline-binding proteins, immunoglobu- 
lin A (igA) proteases, immunity proteins, gly- 
cosyl transferases, and a large number of hypo- 
thetical and conserved hypothetical proteins. 
Comparison of the complete set of predicted 
proteins of S. pneumoniae with those of other 
completely sequenced organisms revealed 1219 
proteins that are most similar to a protein from 
another low-GC Gram-positive species (Lacto- 
coccus lactis has the most with 905) [Web fig. 2 
(9)]. Only! 05 proteins have no similarity to 
low-GC Gram-positive proteins [Web table 2 

Two adjacent genes (SP1467 and SP1468) 
displayed a high degree of DNA sequence iden- 
tity (76 and 88%, respectively) between & pneu- 
moniae and Haemophilus influenzae. Both pairs 
of genes, which may be involved in pyridoxine 
biosynthesis, are more closely related to each 
other than to orihologs in any other species, 
which suggests that they were horizontally 
transferred between these respiratory pathogens. 

The S. pneumoniae genome is rich in inser- 
tion sequences (ISs), which make up —5% 
(101,045 bp) of the TIGR4 chromosome [Table 
1, Fig. I, and Web fig. 1 (9)]. IS genes make up 



>3.5% (84 out of 2236) of the genes in 5. 
pneumoniae, in contrast to other published 
genomes in which the percentage ranges from 
0 to 3% (see www.tigr.org/tigr-scripts/CMR2/ 
CMRHomePagcspl). In addition to IS ele- 
ments, there are two full-length group II introns 
and a 1400-bp fragment of the streptococcal 
conjugalive transposon Tn5252. The TIGR4 
isolate does not contain any large prophagelike 
structure or full-length conjugative transposon. 
The majority of IS elements appear to be non- 
functional because of insertions, deletions, and/ 
or point mutations (Table I) that result in frame- 
shifted or degenerated transposase genes. How- 
ever, programmed frameshifling may allow the 
expression of several of the frameshifted genes 
(15). intact elements are typically families with 
98 to 100% nucleotide sequence identity, prob- 
ably reflecting **waves" of expansion of IS ele- 
ment isotypes. Despite the large number of IS 
elements, only two genes (encoding hypotheti- 
cal proteins SP2178/SP2180 and SP0327/ 
SP0329) are disrupted, and one gene (encoding 
lacX protein SP1194) is truncated by an IS 
insertion. This suggests selection against inser- 
tions into most of the S. pneumoniae genes, or 
• some form of editing to remove these insertions, 
or both. Regarding the latter, it is possible that 
the complete DNA transformation system iden- 
tified in die TIGR4 isolate [Web table 3 (.9)] 
may allow conversion of IS disrupted genes by 
homologous recombination. 

Two types of small, dispersed DNA re- 
peats — the RUP and the BOX elements — were 
identified previously in S. pneumoniae. The 
107-bp RUP element is thought to act like a 
noriautonomous insertion sequence that is mo- 
bilized by the transposase of lS630-Spnl (16). 
The TIGR4 isolate contains 108 RUP elements, 
which insert preferentially into IS elements. 
The BOX element is a modular DNA repeat 
that is composed of three subunits: boxA, boxB 
(which can be present in multiple copies), and 
boxC (1 7). There are 127 BOX elements in the 
T1GR4 isolate; of these, 115 are intact (AiB^ 
sC,) and 12 are incomplete. The BOX elements 
do not appear to be linked to competence or viru- 
lence genes, as was previously suggested (77). 

There appears to be a system for generating 
polymorphic type 1 restriction enzymes in S. 
pneumoniae shnilar to that found in Mycoplas- 



ma pidmonis (18). Shotgun sequencing revealed 
populations of clones from the TIGR4 isolate 
that were fusions of type i restriction-modifica- 
tion enzyme specificity subunit hsdS pseudo- 
genes SP0505 and SP0507 with the nearby in- 
tact hsdS gene SP0508 [Web fig. 3 (9)]. These 
rearrangements, which are recombination events 
between conserved inverted repeats (IRs) within 
SP0508 and the pseudogenes, might be cata- 
lyzed by a nearby integrase (SP0506). Polymer- 
ase chain reaction (PCR) on chromosomal DNA 
using primers inside and outside the JisdS genes 
indicated that the chromosomal region between 
the IRs was invertiblc. The specificity subumt 
may therefore have up to four possible sequenc- 
es, presumably altering the DNA site recogni- 
tion of the restriction-modification system and 
reducing the efficiency of DNA exchange be- 
tween bacteria in the same clone line. 

Streptococcus pneumoniae has the widest 
substrate utilization range for sugars and substi- 
tuted nitrogen compounds of the three complet- 
ed genomes of near-commensal residents of the 
human upper respiratory tract (H. influenzae, 
Nemetia meningitidis', and S. pneumoniae). Ge- 
nome analysis suggests that S. pneumoniae pos- 
sesses pathways for catabolism of pentitols via 
the pentose phosphate pathway, as well as for 
cellobiose. fructose, fucose, galactose, galacti- 
tol, glucose, glycerol, lactose, mannitol, man- 
nose, raffinose, sucrose, trehalose, and mal- 
tosaccharides, which can flow directly into the 
glycolytic pathway (Fig. 2). Ten amino acids 
and yV-acetylglucosaniine can potentially be 
used as nitrogen and carbon sources. Genome 
analysis also revealed a large number of path- 
ways for the complete or partial synthesis of 14 
amino acids and chorisinate (Fig. 2). 

Streptococcus pneumoniae contains a high 
percentage of ATP-dependent transporters, as 
has been seen in other organisms lacking an 
electron transfer chain (19). Streptococcus 
pneumoniae possesses both a complete F-type 
proton adenosine triphosphatase (ATPase) and 
a V-type ATPase that is probably sodium ion- 
specific. It also has a sodium ion/proton ex- 
changer and several probable sodium ion- driv- 
en transporters (Fig. 2), whose activity would 
be dependent on the establishment of a sodium 
motive force. Thus, S. pneumoniae can proba- 
bly interconvert the proton gradient, the sodium 



Fig. 2. Overview of metabolism and transport in S. pneumoniae. Path- 
ways for energy production, metabolism of organic compounds, and 
capsule biosynthesis are shown. There exist other genes in the capsule 
biosynthesis locus to which no specific function could be assigned. 
Transporters are grouped by substrate specificity as follows: inorganic 
cations (green), inorganic anions (pink), carbohydrates/carboxylates 
(yetlow), amino acids/peptides/amines/purines and pyrimidines (red), 
and drug efflux and other (black). Question marks indicate uncertainty 
about the substrate transported. Export or import of solutes is designat- 
ed by the direction of the arrow through the transporter. The energy- 
coupling mechanisms of the transporters are also shown: Solutes trans- 
ported by channel proteins are shown with a double-headed arrow; 
secondary transporters are shown with two arrowed lines, indicating 
both the solute and the coupling ion; ATP-driven transporters are 



indicated by the ATP hydrolysis reaction; and transporters with an 
unknown energy coupling mechanism are shown with only a single 
arrow. Components of transporter systems that function as multisubunit 
complexes that were not identified are outlined with dotted lines. Where 
multiple homologous transporters with similar substrate predictions 
exist, the number of that type of transporter is indicated in parentheses. 
Systematic gene numbers (SPXXXX) are indicated next to each pathway 
or transporter; those separated by a dash represent a range of consec- 
utive genes. Details for the PTS transporters are indicated in Web fig. 4 
(9). Abbreviations are as folllows: ADP, adenosine diphosphate; UMP, 
uridine monophosphate; UDP, uridine diphosphate; FucNAc, N-acetylfu- 
cosamine; Gal, galactose; CalNAc, ^-acetylgalactosamine; CluNAc, N- 
acetylglucosamine; ManNAc, rV-acetylmannosamine; NeurNAc, 
fV-acetylneuraminate; P, phosphate; PP, diphosphate; Pyr, pyruvate. 
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ion gradient, and ATP as energy sources, using 
its F- and V-type ATPases and its sodium ion/ 
proton exchanger. This is somewhat similar to 
the activity of Treponema pallidum, wliich pos- 
sesses two V-type ATPases, probably for pro- 
tons and sodium ions, but no exchanger {20). 

Over 30% of the transporters in S. pneu- 
moniae were predicted to be sugar transporters 
(Fig. 2), which is the highest percentage ob- 
served to date in any sequenced prokaryote {19). 
Other completely sequenced respiratory tract or- 
ganisms, H. influenzae and N. meningitidis, 
have a paucity of sugar transporters and are 
much more reliant on carboxylates and other 
compounds for their carbon needs. This sug- 
gests that S. pneumoniae may occupy a distinct 
inicrocnviionment within the respiratory tract. 
Host glycoproteins and murein polysaccharides, 
as well as its own capsular polysaccharides, may 
be major sources of sugars for S. pneumoniae. 
Reliance on sugar transport and metabolism ap- 
pears to be a common feature of streptococci, 
based on their abundance in sugar-rich environ- 
ments such as the oral cavity {21). 

The S. pneumoniae sugar transporters pri- 
marily consist of phosphoenolpyruvate (PEP)- 
dependent phosphotransferase system (PTS) 
transporters and ATP-binding cassette (ABC) 
transporters. Streptococcus pneumoniae has 21 
PTS sugar-specific enzyme II complexes with a 
variety of gene and domain arrangements [Web 
fig. 4 (0)], more than twice as many as any other 
sequenced organism relative to genome size, 
again emphasizing the importance of sugars to 
the life-style of S. pneumoniae. It also possesses 
single copies of the general PTS enzymes en- 
zyme 1 and Mstidine-containing protein (HPr), 
as well as a HPr serine kinase for regulatory 
purposes. The S. pneumoniae PTS includes sys- 
tems specific for fructose, glucose, lactose, man- 
nose, raannitol, trehalose, Af-acctylglucosarnine, 
and sucrose, as well as a variety of PTS systems 
whose sugar specificities remain to be deter- 
mined. One PTS system (SP2161 to SP2164) 
is encoded within a gene cluster including 
all of the genes necessary for fucose metab- 
olism. W-acetylfucosaraine is a constituent 
of the capsule of the TIGR4 isolate, and it is 
therefore possible that this system may be a 
PTS for the uptake of A f -acetylfucosamine 
or other fucose derivatives. In addition to 
the PTS, there are seven ABC sugar uptake 
systems, most of which do not have cyto- 
plasmic ATP-binding components encoded 
with the other components (Fig. 2). 

Streptococcus pneumoniae also possesses a 
variety of ATP- and ion-driven amino acid 
transporters, as well as transporters for poly- 
arnines, uracil, and xanthine. A single ABC 
transporter lacking a binding protein was found 
for choline, an important requirement for the 
streptococcal cell wall. In contrast to the empha- 
sis on sugar transport, only a single transporter 
was found for monocarboxylates and one for 
dicai'boxylates. Streptococcus pneumoniae lias a 



relatively limited repertoire of transporters for to SP1650) and a zinc transporter (SP2169 to 
inorganic anion and cations, although this in- SP2171), which have been associated with vir- 
cludes a manganese ABC transporter (SPI648 ulcncc (22), as well as three ferric iron and three 



Tabte 1. S. pneumoniae IS families. 



IS family* 


Name 
(isotype) 


IS size {nt)t 


Intact 
transposase 


Truncated or 
frameshifted 


Species with homologous 
elements^ 


IS3 


IS3-Spn 


1359 


0 


14 


Sp Ec My Sg Ne Ha La Ba 


IS5 


IS1381-Spn 


854-860 


0 


12 


La 


IS5 


IS1515 


861 


0 


1§ 


Sp Fr Cy La 


IS30 


IS1239 


1046 


0 


2 


Sp So Ct St Ae Le 


IS66 


IS66 


2484-2498 


0 


7 


IS110 




? 


0 


2 




IS605 


IS200 


747 


2 


1 


Ec Sa Ye En CI Ha Vi Wo Th De 


IS630 


IS630-Spn1 


896 


0 


12 


Sp Sy Ne 


IS 1 380 


IS1380-Spn 


1703 


11 


1 


Ab Sp Ba Xa KL Sm 


ISL3 


IS1167 


1414-1432 


8 


14 


Sp Sh Sd En La St Le Mi 


Unknown 






0 


17 


Total 






21 


84 





♦According to the Mahillon and Chandler classification [J. MahiUon. M. Chandler, Microbiol. Moi Biol. Rev. 62, 725 
(1998)]. f Distance between inverted repeats flanking intact or no n truncated IS elements. {Species with the 
most similar elements In GenBank. BtastP hits with an £ value <10~ 2 ° were included. Key: Ab, Acetobacter, Ae, 
Aeromonas; Ba, Bacillus; CI, Clostridium; Cy, Cyanobacterium; De, Deinococcur. Ec, B . coir, En, Enterococcus; fr, Fremyella; 
Ha. Haemophilus: KL Klebsiella: La, Lactobacillus', Le. Leuconostoc; Mi, Microcystis; My, Mycoplasma; Ne, Neisseria; Sa, 
Salmonella; Sg, S. agalacttae; Sd, 5. gordonii; Sh, 5. thermophttus; Sm, Sphingomonas; So, S. pyogenes; Sp, S. pneumoniae; 
St Staphylococcus; Sy, Synechocystis; Th, Thermotoga; Vi, Vibrio; Wo, Wolbachia; Xa, Xanthobacter, Ye, Yersinia. $5. 
pneumoniae element demonstrates functional activity [R. Munoz, R. Lopej, L. Garcia, / Bacteriol. 180, 1381 (1998)]. 



Tabte 2- Subset of 5. pneumoniae genes related to virulence containing stretches of iterative DNA that 
could induce phase-variation. Iterative DNA motifs, including homopolyrneric tracts, were searched in the 
T1CR4 genome [see (29)]. The iterative motifs identified in genes related to virulence are displayed. 
Abbreviations under "location" are as follows: 5', the motif is in the 5' third of the gene; M, the motif is 
in the middle third; 3', the motif is in the 3' third; P, the motif is within 50 nt upstream of the translation 
start site. For SP1772, repeats occur in all three parts of the protein. 



ORF 


Description 


Repeat 


Location 


SP0071 


Immunoglobin A1 protease 


(AT) 4 , (TA) 4 


M, 3' 


SP0102 


Glycosyl transferase 


(G). 


M 


SP0168 


Putative macrolide efflux protein 


0TA) 4 


5' 


SP0346 


Capsular polysaccharide biosynthesis protein 


(TATT) 3 


5' 




(Cps4A) 




SP0349 


Capsular polysaccharide biosynthesis protein 


(A) a 


5' 




(Cps4D) 






SP0350 


Capsular polysaccharide biosynthesis protein 


(AG), 


M 




(Cps4E) 




SP0351 


Capsular polysaccharide biosynthesis protein 


(A)s. (A) 9 


5', 5' 




(Cps4F) 




SP0352 


Capsular polysaccharide biosynthesis protein 


(AT) 4 . (T) 8 


5'. M 




(Cps4G) 




SP0353 


Capsular polysaccharide biosynthesis protein 
(Cps4H) 


<A)e 


5' 


SP0462 


Cell wall surface anchor family protein 


(GA) 4 


M 


SP0664 


Putative zinc metalloprotease (ZmpB) 


(CAAAA) 3 


5' 


SP0689 


UDP-A/-acetyiglucosamine-A/-acetylmuramyl- 


{G)„ (G) 6 


5' 




(pentapeptide) pyrophosphoryl-undecaprenol 






A/-acetylglucosamine transferase 






SP0907 


Putative capsular polysaccharide biosynthesis protein 


(C) 6 • 


5' 


SP0966 


Adherence and virulence protein A 


(A) 8 


5' 


SP1267 


LicC protein 


(ATG),, (AG) 4 


5', M 


SP1272 


Putative polysaccharide biosynthesis protein 


(CT)*. (CT) 4 


M, 3' 


SP1274 


LicD2 protein 


(A). 


5' 


SP1492 


Cell wall surface anchor family protein 


(CT) 4 


3' 


SP1693 


Neuraminidase A, authentic frameshift 


CO a 


5' 


SP1769 


Glycosyl transferase, authentic frameshift 


(C)* (CT) 4 


5', M 


SP1772 


Cetl wall surface anchor family protein 


(TCACCCTCGACAA 








GTGCGTCGGCC)^ 




SP1950 


Putative bacteriocin formation protein 


(T) 9 


P 


SP2136 


Choline-binding protein (PcpA) 


(T)e. (T) 8 


5' 


SP2145 


Antigen, cell wall surface anchor family 


(C) 6 


5' 


SP2190 


Choline-binding protein A (CbpA) 


0), (T). 


5', M 
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Table 3. S. pneumoniae proteins likely to be exposed on the surface, based on computer predictions [see {33)]. 



ORF 


Description 


W*™' hill*. 

binding 


Lipoprotein} 


SignalP§ 


YSIRKU Atypicallj 


Repeats 


SP0057 


Bew-N-acetylhexosaminidase (StrH) 


4* 






4- 
i 




C DflfiAQ 


Choline-binding protein i (Cbpl) 












<Dfin71 

jrUU/ 1 


immunoglobulin A1 protease (Iga) 


1 




4* 




4- 4- 




Cell wall surface anchor family protein 






4- 








ABC transporter, substrate-binding protein 












SP01 12 


Amino acid ABC transporter, peri plasm ic 




4* 


4* 








amino acid-binding protein, putative 












SP01 1 7 


Pneumococcal surface protein A (PspA) 


+ 




~r 






CDA1 AQ 

orU 14o 


ABC transproter, substrate- binding protein 




4. 


_j_ 






jrw 1 45 


Lipoprotein 






-)- 






CDH1Q1 

irU iy I 


Hypothetical protein 














Hypothetical protein 




+ 


4. 








Alkaline amylopullulanase, putative 












SP0314 


Hyaturonidase 


4- 




4- 








Cell wall surface anchor family protein, 


T 






4* 






authentic frameshift 












CDn*377 


Choline-binding protein C (CbpC) 


4* 










CDft37fl 


Choline-binding protein J (CbpJ) 


+ 










CDA3QA 


Choline-binding protein G (CbpG) 


i 










jrw? 1 


v_noune*Dmoing proicin r ^oprj 


-|- 










irU40iC 


Cell wall surface anchor family protein 






4- 






CDAAC3 


Cell wall surface anchor family protein 


T 




_l_ 






CQf\ACA 

brU4o4 


Cell wall surface anchor family protein 


] 




4* 






C DA/ICQ 


Sortase, putative 




4* 








brU49o 


Endo-beta-N-acetylgtucosaminidase, putative 


4" 




4~ 


"T 




SP0620 


Amino acid ABC transproter, amino 




+ 


4" 








acid-binding protein, putative 












brlTO^y 


Conserved hypotetical protein 




4* 


4 _ 






< DA£A 1 


Serine protease, subtilase family 






4. 




4- 4- -t- 


CDAC/tO 

brUo4o 


Beta-galactosidase (BgaA) 


4" 










SP0659 


Thioredoxin family protein 




+ 


+ 








Zinc metallopro tease ZmpB, putative 


4" 




_|_ 


4* 






Pneumococcal surface protein, putative 


4* 












Peptidyl-prolyl cis- trans isomerase, 




1 










cyclophilin-type 












SP0845 


Lipoprotein 




4* 


4* 






SPQ899 


Conserved hypothetical protein 




+ 


+ 






C OAQO/1 


(.noune-Dinaing protein t ^u.Dptj 


_L 
1 




i 








Endo-beta-A/-acetylglucosaminidase (LytB) 


4- 
i 










CD AO £11 


Protease maturation protein, putative 




4" 


*f* 






c oi Ann 


Thioredoxin family protein 












SP 1002 


Adhesion lipoprotein 




4* 


+ 








Iron-compound ABC transporter, iron 




4* 










com pound -binding protein 












SP1154 


Immunoglobulin A1 protease (Iga) 


4" 






"T 




SP1394 


Amino acid ABC transporter, amino acid- 




+ 


+ 








binding protein 












SP1400 


Phosphate ABC transporter, phosphate- 




+ 


+ - 








binding protein, putative 












SP1417 


PspC -related protein, degenerate 


+ 










SP1492 


Cell wall surface anchor family protein 


+ 








+■ 


SP1500 


Amino acid ABC transporter, amino acid- 




+ 


+ 








binding protein (AatB) 












SP1S27 


Oligopeptide ABC transporter, oligopeptide - 




+ 










binding protein (AliB) 












SP1573 


lysozyme (LytC) 


+ 








4- 


SP1650 


Manganese ABC transporter, manganese- 






+ 








binding adhesion liprotein 












SP1683 


Sugar ABC transporter, sugar-binding protein 




+ 


+ 






SP1690 


ABC transporter, substrate-binding protein 




+ 






+(540) 


SP1772 


Cell wall surface anchor family protein 


+ 








SP1796 


ABC transporter, substrate-binding protein 




+ 








SP1826 


ABC transporter, substrate-binding protein 




4- 









(Continued on page 504) 
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Table 3. (Continued) 



ORF 


Description 


LPxTG* 




Lipoprotein^ 


SignalP§ 


YSIRKJ 


Atypical^ 


Repeat* 


SP1833 


Cell wall surface anchor family protein 


+ 










+ 




SP1870 


Iron-compound ABC transporter, permease 
protein 






+ 


-f 








SP1872 


Iron-compound A8C transporter, iron- 
compound binding protein 








+ 






+ 


SP1891 


Oligopeptide ABC transporter, oligopeptide* 
binding protein (AmiA) 








+ 






+ 


SP1897 


Sugar ABC transporter, sugar-binding protein 
(MsmE) 






+ 










SP1937 


Autolysin (LytA) 
















SP197S 


SpofllJ family protein 






+ 










SP1992 


Cell wall surface anchor family protein 
















SP2041 


SpolllJ family protein 






+ 


+ 








SP2084 


Phosphate ABC transporter, phosphate- 
binding protein (PstS) 






+ 


+ 








SP2108 


Maltose/maltodextrin ABC transporter, 
maltose/rnaltodextrin-binding protein 
(MalX) 








+ 








SP2136 


Choline -binding protein (PcpA) 




+ 












SP2169 


Zinc ABC transporter, zinc-binding 
lipoprotein (AdcA) 






+ 


+ 








SP2190 


Choline-binding protein A (CbpA) 


+ 


+ 




+ 






4- + 


SP2197 


ABC transporter, substrate-binding protein, 
putative 






+ 


+ 








SP2201 


Choline-binding protein 0 (CbpD) 

















•Sortase motif. f Choline- binding motif. (Lipid attachment motif. §Slgnal peptide; a Y- score lower limit of 0.3 was used as the cutoff. ijSignal peptide YSIRK for 
Gram- positive cell wall-attached proteins. ^fORFs present tn regions of atypical nucleotide composition [see (40)]. #ORFs containing iterative DNA motifs that could induce 
repeat-associated phase variation; one plus sign is shown per motif (exception: SP1772 contains 540 copies of a 24-nt motif). 



phosphate ABC transporters. Overcoming iron 
and phosphate limitation: may also be important 
for virulence. Streptococcus pneumoniae pos- 
sesses an ABC efflux system involved in com- 
petence (SP0042 and SP0043). The character- 
ized macrolide efflux proteins MefE and MefA 

(23) are absent from the TIGR4 isolate. 
Analysis of the genome sequence suggests 

that extracellular enzyme systems for the metab- 
olism of polysaccharides and hcxosainines are 
important for providing carbon and nitrogen for 
this organism and may be important for the 
synthesis of the capsule and the virulence of this 
species. Enzyme systems based on W-acetylglu- 
cosaminidases, a- and p-galactosidases, en- 
doglycosidases, hydrolases, hyaluronidases, and 
neuraminidases are present in S. pneumoniae. 
These enzymes probably enable degradation of 
host polymers, including mucins, glycolipids, 
and hyaluronic acid, as well as degradation of 
the organism's own capsule. These enzymatic 
activities may serve to increase substrate avail- 
ability to S. pneumoniae by converting larger 
polymers to products that can be transported 
into the cell, while at the same time damaging 
host tissues and facilitating colonization. 

Pathogenesis and virulence in S. pneu- 
moniae are associated with the inflammation 
and colonization of host* tissues and with bypass 
of the host immune system [Web table 4 (9)] 

(24) . The polysaccharide capsule is considered 
to be the primary pneumococcal virulence de- 
terminant, allowing for the evasion of the host 
immune response (25). Although no pathway 



has been biochemically characterized for the 
synthesis of the type 4 capsular polysaccharide, 
a proposed pathway for capsular biosynthesis 
derived from the genome analysis is shown in 
Fig. 2. A 13-gene cluster (SP0346 to SP0360) 
was identified that is likely to be involved in 
capsular biosynthesis and secretion. Tins region 
of the genome has an atypical nucleotide com- 
position and is flanked by two IS elements on 
each side. Outside of the IS elements are the 
aliA (also called plpA) (SP0366) and dexB 
(SP0342) genes, which also flank the capsule 
loci in other S. pneumoniae strains (26). This 
gene cluster may not represent the complete 
pathway for capsular biosynthesis, because sev- 
eral other capsular polysaccharide biosynthesis 
genes are dispersed elsewhere in the genome. 
An operon of genes involved in the incorpora- 
tion of phosphorylcholine into teichoic acid is 
also present in this genome (SP1267 to 
SP1274), as are all the genes required for pep- 
tidoglycan synthesis. 

Phase variation has been described in S. 
pneumoniae and shown to involve variation of 
multiple cell-surface structures that contribute to 
the ability of the organism to interact with its 
host (27). One of die mechanisms involves re- 
versible, high-frequency molecular switching of 
genes through slippagelike mechanisms at iter- 
ative DNA motifs, especially homopolymeric 
tracts (28). Such motifs were identified in the 
TIGR4 genome (29), and their location was 
correlated to predicted genes and their promot- 
ers, bi total, 397 genes (18%) contain iterative 



DNA motifs [Web table 5 (9)] and 25 of these 
are directly related to virulence (Table 2), in- 
cluding genes from the teichoic acid and capsule 
pathways that are associated with colony opac- 
ity variation (30). In contrast to other pathogenic 
species, most of the nucleotide repeat--contain- 
ing genes in S. pneumoniae are not framcshiftecL 
This might reflect the presence of general mis- 
match repair in S. pneumoniae (31), a process 
absent in many pathogens (32). 

Sixty-nine proteins that are likely to be ex- 
posed on the surface of this organism were 
identified (Table 3) (33). Genomewide analysis 
of all predicted signal sequences (34) revealed 
two discemable clusters. The first cluster con- 
tains most of the lipoproteins for which the lipid 
attachment motif (33) extends beyond the co- 
valently modified cysteine and the membrane- 
spanning region. This suggests some reuse of 
lipoprotein signal sequences as evolutionary 
cassettes. The second cluster, composed of pro- 
teins anchored in the cell wall through their 
sortase motif (33), revealed a previously unchar- 
acterized pentapeptide motif ( Y/F)SfRK (35), 
starting usually at residue 12 (Table 3). A large 
fraction of the surface proteins of various spe- 
cies of Streptococcus- and Staphylococcus dis- 
play this motif in their signal peptides. The 
near-perfect conservation of glycine and serine 
at the fourth and seventh positions past the 
pentapeptide, within the predicted transmem- 
brane helix, suggests a specific functional inter- 
action and may reflect a step in cell wall attach- 
ment in S. pneumoniae and related species. 
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Among the newly identified surface-ex- 
posed genes are a putative alkaline amylopullu- 
lanasc (SP0268) and a putative endo-g-A'- 
acetylglucosarainidase (SP0498). These two 
genes could be involved in the degradation of 
host polysaccharides. Several cell-wall surface 
anchor family proteins and lipoproteins are also 
possibly involved in adherence to host cells. An 
unusual surface-associated component in this 
genome is a 4776 -amino acid protein (SP1772) 
that contains 540 imperfect repeats of the amino 
acid motif SASTSASA (35). This protein is 
similar to the Lactobacillus brevut surface layer 
protein (36) and to proteins from S. gordonii 
and S. crtstatus. It is adjacent to seven glycosyl 
transferases (SP1758, SP1764 to SP1767, 
SP1770, and SP 177 1) that could make O-linked 
glycosylations on the serines in SP1772. This 
would produce a structure similar to mucins that 
might also coat the surface of the bacterium or 
interact with host cellular mucins, although 
some strains of S. pneumoniae have been shown 
not to interact with mucins (37). 

Comparative genome hybridizations on 
DNA microarrays were performed (38) between 
the T1GR4 isolate and both the R6 noncapsu- 
lated laboratory strain and the closely related 
D39 serotype 2 capsulated strain (39). Nine 
gene clusters in the TIGR4 isolate did not hy- 
bridize with the other two strains [Fig. I and 
Web table 6 (9)], which suggests that they are 
absent or signiiicanlly divergent in strains R6 
and D39. Six of these regions display an atypi- 
cal nucleotide composition [Fig. L and Web 
table 7 (9)] (40), which suggests thai they were 
horizontally acquired by the TIGR4 isolate. 
These include the capsule biosynthesis locus 
(SP0347 to SP0353), the V-typc ATPase locus 
(SP1315 to SPI322), a gene cluster encoding a 
cell wall surface anchor protein (SP1772) and 
seven glycosyl transferases, and a putative mac- 
rolide efflux protein (SP0168). In addition to 
these regions, strains R6 and D39 also lack three 
putative sortases and two sortase motif proteins 
(SP0463 to SP0468), as well as choline-binding 
protein I (SP0069) and an [gA I protease paralog 
(SP0071). Similar differences in the capsule lo- 
cus, IgAl protease, and choline-binding protein 
were identified by Hakenbeck et al. (41) by 
means of an oligonucleotide-based raicroarray. 
The majority of the loci that differ between the 
three strains are surface-exposed and/or related 
to pathogenesis, and these differences may con- 
tribute to differences in virulence and antigenic- 
ity between these strains. 

The complete genome sequence of S. pneu- 
moniae has revealed new insights into the com- 
plexity of its biology and metabolism, particu- 
larly with regard to the dual role of extracellular 
enzyme systems to provide essential nutrients 
while at the same time facilitating the coloniza- 
tion of host tissues. Recent experimental studies 
based on the preliminary genome sequence of 
the T1GR4 isolate have revealed new candidate 
vaccine targets for tlus species (42). The avail- 



ability of the complete genome sequence will 
provide additional avenues for followup studies 
on the basic biology and pathogenicity of S. 
pneumoniae. 
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NPAS2: An Analog of Clock 
Operative in the 
Mammalian Forebrain 

Martin Reick, 1 Joseph A. Garcia, 2 Carol Dudley, 1 
Steven L McKnight 1 * 

Neuronal PAS domain protein 2 (NPAS2) is a transcription factor expressed pri- 
marily in the mammalian forebrain. NPAS2 is highly related in primary amino acid 
sequence to Clock, a transcription factor expressed in the suprachiasmatic nucleus 
that heterodimerizes with BMAL1 and regulates circadian rhythm. To investigate 
the biological role of NPAS2, we prepared a neuroblastoma cell line capable of 
conditional induction of the NPAS2:BMAL1 heterodimer and identified putative 
target genes by representational difference analysis, DNA microanrays, and North- 
em blotting. Coinduction of NPAS2 and BMAL1 activated transcription of the 
endogenous Perl, Pert, and Cryl genes, which encode negatively activating com- 
ponents of the circadian regulatory apparatus, and repressed transcription of the 
endogenous BMAL1 gene. Analysis of the frontal cortex of wild-type mice kept in 
a 24-hour light-dark cycle revealed that Perl, Per2, and Cryl mRNA levels were 
elevated during darkness and reduced during light, whereas BMAL 1 mRNA displayed 
the opposite pattern. In situ hybridization assays of mice kept in constant darkness 
revealed that Perl mRNA abundance did not oscillate as a function of the circadian 
cycle in NPAS2-deficient mice. Thus, NPAS2 likely functions as part of a molecular 
clock operative in the mammalian forebrain. 



REPEATS program [G. Benson, M. S. Waterman, Nucleic 
Acids Res. 22, 4828 (1994) J. The minimum length of 
homopdy merle tracts was set at eight for A and T and at 
six for C and C; four tandem copies of di- and trinucle- 
otides; and three copies of tetra-, penta-. and hexa nucle- 
otides. Heptanudeo tides and above were not found in 
three or more copies, except for the imperfect repeats in 
SP1772. The ratio of the observed frequency of ho- 
mopotymeric tracts to their expected frequency was 
determined by means of Markov chain analysis, as de- 
scribed [N. J. Saunders et ai, Mot Microbiol. 37. 207 
(2000)]. It revealed that C or C tracts of 8 bp and A or T 
tracts of 10 and 11 bp are slightly overrepresented. 

30. J. O. Kim et at., Infect. Immun. 67, 2327 (1999). 

31. O. Humbert, M. Prudhomme, R. Hakenbeck, C C. 
Dowson, J. P. Claverys, Proc. Natl. Acad. Sci U.S.A. 
92, 9052 (1995). 

32. J. A. Eisen, P. C Hanawalt, Mutat. Res. 435, 171 

(1999) . 

33. Putative choline-binding motifs [J. L Garcia, A R. 
Sanchez -Beato, F. J, Medrano, R. Lopez, in Streptococcus 
pneumoniae — Molecular Biology and Mechanisms of 
Disease, A Tomasz. Ed. (Mary Ann Liebert. Larch mont, 
NY, 2000), pp. 231-244] were identified using Pfam 
hidden Markov model (HMM) PF01473 [A. Bateman et 
aL Nucleic Acids Res. 28, 263 (2000)]. LPxTG-type 
Gram-positive anchor regions |M. J. Pallen, A. C Lam, M. 
Antonio, K. Dunbar, Trends Microbiol. 9, 97 (2001)] were 
detected by Pfam HMM PF00746 and by a new HMM 
built with HMMER 2.1.1 [S R. Eddy, Bhinformatics 14, 
755 (1998)] from a new, curated alignment of the sur- 
rounding region in 5. pneumoniae. Candidate Lipoprotein 
signal peptides [S Hayashi, H. C Wu, J. Bhenerg. Bb- 
membr. 22, 451 (1990)] were flagged by NH 2 -terminal 
exact matches to the pattern {DERK}(6)-[LIVMFW- 
STAG](2)-|UVMFYSTAGCQ]-[AGS]-C (35), culled of hy- 
pothetical proteins and cytosolic proteins, aligned man- 
ually, and used to generate a new HMM. Proteins match- 
ing both the HMM and the regular expression are pre- 
dicted lipoproteins. Putative signal peptides were 
identified with SignalP [H. Nielsen, J. Engelbrecht, S. 
Brunak, G. von Heijne, Protein Eng. 10, 1 (1997)]. 

34. The NH 2 -terminal regions of all proteins predicted to 
have signal sequences were collected for clustering and 
alignment with ClustalW and were scrutinized. A HMM 
based on an edited alignment of 40-residue segments 
around the ( Y/F)SIRK motif found several hundred hits 
to a nonredundant amino add database. A more general 
motif, based on the larger family of YSIRK proteins, is 
( Y/F)(S/A)(l/L)(R/K)(R/K)xxxGxxS (35). 

35. Single- letter abbreviations for the amino acid resi- 
dues are as follows- A, Ala; C, Cys; D, Asp; E, Gtu; F ( 
Phe; G, Gly; H, His; I, lie; K, Lys; I, Leu; M, Met; N, Asn; 
P, Pro; Q, Gin; R, Arg; S, Ser; T, Thr, V, Val; W, Trp; and 
Y, Tyr. 

36. C. Vidgren, L Palva, R. Pakkanen, K. Lounatmaa, A. 
PalvaJ. Bacterbl. 174, 7419 (1992). 

37. J. Davies et ai, Infect. Immun. 63, 2485 (1995). 

38. This method is used to identify genomic differences 
between the TIGR4 strain and strains R6 and D39. All the 
predicted genes from the TtCR4 strain were amplified by 
PCR and arrayed on glass microscope slides as previously 
described [S. Peterson, R. T. CUne, H. Tettelin, V. Sharov, 
D. A. Morrison,;. Bacteriol. 182. 6192 (2000)]. Genomic 
DNA for comparative genome hybridization studies was 
Labeled according to protocols provided by | DeRisI 
(www.mfaoarrays.org/pdfs/GenomfcDNALabeLB.pdf), 
except that genomic DNA was not digested or sheared 
before labeling Arrays were scanned with a Gene Pi x 
4000B scanner from Axon (Union Gty, CA), and individ- 
ual hybridization signals were quantitated with T1GR 
SPOTFINDER [P. Hegde ef ai, Biotechniques 29. 548 

(2000) ]. 

39. M. D. Smith, W. R. Guild, J. Bacteriol. 137, 735 
(1979). 

40. Regions of atypical nucleotide composition were identi- 
fied by the x* analysis: The distribution of all 64 trinucle- 
otides (trimers) was computed for the complete genome 
in all six reading frames, followed by the trimer distri- 
bution in 2000- bp windows. Windows overlapped by 
1500 bp. For each window, the x 2 statistic on the 
difference between its trimer content and that of the 
whole genome was computed. The most atypical re- 
gions, with a score of 600 and above, were considered in 
this analysis. 



Locomotor activity, body temperature, endo- 
crine hormones, and metabolic rate fluctuate 
cyclically with a period of 24 hours. The 
regulatory apparatus that controls circadian 
rhythm consists of a transcriptional feedback 
cycle that is evolutionarily conserved in a 
wide variety of metazoans (./). In mammals, 
the activating arm of this cycle is executed by 
a heterodimeric transcription factor com- 
posed of the Clock and BMAL1 gene products 
(2\ The Clock:BMALl heterodimer binds 
directly to regulatory sequences of the genes 
comprising the negative arm of the transcrip- 
tional feedback cycle. The negative compo- 
nents of the regulatory apparatus include 
three period (Per) genes and two crypto- 
chrome {Cry) genes (3-11), whose products 
function in a poorly understood manner to 
inactivate the Clock:BMALl heterodimer. 
The duration of Per and Cry activity may be 
modified by a serine-threonine kinase vari- 
ously termed casein kinase le or Tau in mam- 
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mals and Doubletime in flies (12-14). In the 
absence of entraining influences, this regula- 
tory apparatus oscillates rhythmically at or 
near the 24-hour light-dark cycle (i.e., 12 
hours light, 12 hours dark). Entrainment de- 
rived from light, food, temperature, and met- 
abolic activity can advance or delay the cen- 
tral regulatory apparatus such that it is prop- 
erly adapted to the summation of these exter- 
nal zeitgebers. , 

The master pacemaker of circadian 
rhythm resides in the suprachiasmatic nucle- 
us (SCN), a small group of neurons located at 
the base of the optic chiasma within the 
central nervous system (IS). Classical trans- 
plantation experiments have demonstrated 
that the SCN is necessary and sufficient to 
specify circadian rhythm (16, 17). Surprising- 
ly, the same molecular clock is operative in 
sites peripheral to the SCN (//, 1S\ includ- 
ing cultured mammalian cells of non-neural 
origin (79). 

Neuronal PAS domain protein 2 (NPAS2, 
also termed MOP4) is a member of the basic 
helix-loop-helix (bHLH)-PAS domain fami- 
ly of transcription factors. The gene encoding 
NPAS2 is expressed in a stereotypic pattern 
of brain nuclei located within the mammalian 
forebrain (20, 21). Upon positional cloning of 
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CLUSTAL W (1.74) multiple sequence alignment 



Q6WNQ5 | Q6WNQ5_STRPN 
Q8CWR4 | Q8CWR4_STRR6 
Q3DFQ2 i Q3DPQ2_3TRR6 
Q9AG74 | Q9AG74_STRPN 
Q9AHT9 | Q9AHT9_STRPN 
Q8DQ08 IQ8DQ08 STRR6 



Q6WNQ5 | 
Q8CWR4 | 
Q8DPQ2 | 
Q9AG74I 
Q9AHT9 | 
Q8DQ08 | 



Q6WNQ5_STRPN 
Q8CWR4_STRR6 
Q8DPQ2_STRR6 
Q9AG74_STRPN 
Q9AHT9_STRPN 
Q8DQ08 STRR6 



Q6WNQ5 | Q6WNQ5_STRPN 
Q8CWR4 | Q8CWR4_STRR6 
Q8DPQ2 | Q8DPQ2_STRR6 
Q9AG7 4 | Q9AG74_STRPN 
Q9AHT9 | Q 9AHT 9_STR PN 
Q8DQ08 IQ8DQ08 STRR6 



Q6WNQ5 | Q6WNQ5_STRPN 
Q8CWR4 | Q8CWR4_STRR6 
Q8DPQ2 | Q8DPQ2_STRR6 
Q9AG74 | Q9AG74_STRPN 
Q9AHT9 | Q9AHT9_STRPN 
Q8DQ08 IQ8DQ08 STRR6 



Q6WNQ5 | Q6WNQ5_STRPN 
Q8CWR4 | Q8CWR4_STRR6 
Q8DPQ2 | Q8DPQ2_STRR6 
Q9AG74 | Q9AG74_STRPN 
Q9AHT9 | Q9AHT9_STRPN 
Q8DQ08 IQ8DQ08 STRR6 



Q6WNQ5 | Q6WNQ5_STRPN 
Q8CWR4 | Q8CWR4_STRR6 
Q8DPQ2 | Q8DPQ2_STRR6 
Q9AG74 | Q9AG74_STRPN 
Q9AHT9 | Q9AHT9JSTRPN 
Q8DQ08 IQ8DQ08 STRR6 



Q6WNQ5 | Q6WNQ5_STRPN 
Q8CWR4 | Q8CWR4_STRR6 
Q8DPQ2 | Q8DPQ2_STRR6 
Q9AG7 4 | Q9AG7 4_STRPN 
Q9AHT9 | Q 9AHT 9_S TR PN 
Q8DQ08 IQ8DQ08 STRR6 



CAYALNQHRS QENK- DNNR 

-MNQIYLRKEERMKINKKYLAGSVATLVLSVCAYELGLHQAQTVK-ENNR 
MQLEISNRKRVSMKINKKYLVGSAAALILSVCSYELGLYQARTVK-ENNR 

MKINKKYLVGSAAALILSVCSYELGLYQARTVK-ENNR 

MKINKKYLVGSAAALILSVCSYELGLYQARTVK-ENNR 

MKINKKYLAGSVAVLALSVCSYELGRHQAGQVKKESNR 

*:* * . ::: * : . ** 

VSYVDGSQSSQKSENLTPDQVSQKEGIQAEQIVIKITDQGYVTSHGDHYH 

VSYIDGKQATQKTENLTPDEVSKREGINAEQIVIKITDQGYVTSHGDHYH 

VSYIDGKQATQKTENLTPDEVSKREGINAEQIVIKITDQGYVTSHGDHYH 

VSYIDGKQATQKTENLTPDEVSKREGINAEQIVIKITDQGYVTSHGDHYH 

VSYIDGKQATQKTENLTPDEVSKREGINAEQIVIKITDQGYVTSHGDHYH 

VSYIDGDQAGQKAENLTPDEVSKREGINAEQIVIKITDQGYVTSHGDHYH 
★ ^ ★ . ★★.*★***★-★*. .*★★.*★★★★**★★***★****★**** 

YYNGKVPYDALFSEELLMKDPNYQLKDADIVNEVKGGYIIKVDGKYYVYL 

YYNGKVPYDAIISEELLMKDPNYQLKDEDIISEIKGGYVIKVDGKYYVYL 

YYNGKVPYDAIFSEELLMKDPNYKLKDEDIVNEVKGGYVIKVDGKYYVYL 

YYNGKVPYDAIISEELLMKDPNYQLKDEDIISEIKGGYVIKVDGKYYVYL 

YYNGKVPYDAIISEELLMKDPNYKLKDEDIVNEVKGGYVIKVDGKYYVYL 

YYNGKVPYDAIISEELLMKDPNYQLKDSDIVNEIKGGYVIKVDGKYYVYL 

^ ★ . ★**★.★★**★★★★★** 

KDAAHADNVRTKDE I NRQKQE HVKDNE KVNSNVAVARSQGRYTTND 

KDAAHADNVRTKEEINRQKQEHSQHREGGTPRNDGAVALARSQGRYTTDD 
KDAAHADNVRTKEEINRQKQEHSQHREGGTPRNDGAVALARSQGRYTTDD 
KDAAHADNVRTKEEINRQKQEHSQHREGGTPRNDGAVALARSQGRYTTDD 
KDAAHADNVRTKEEINRQKQEHSQHREGGTPRNDGAVALARSQGRYTTDD 

KDAAHADN I RTKE E I KRQKQE R S HNHN SRADNAVAAARAQGRYTTDD 

. ;> ** *★.★*★*★★.★ 

PKSDLSASELAAAKAHLAGK — 
PKNELSAS E LAAAKAFLS GRGN 
PKNELSAS E LAAAEAFLS GRGN 
PKNELSAS E LAAAKAFLS GRGN 
PKNELSASE LAAAEAFLS GRGN 
PKSDLSASELAAAQAYWNGK— 



GYVFNPADI 
GYIFNASDI 
GYIFNASDI 
GYIFNASDI 
GYIFNASDI 
GYIFNASDI 



IEDTGNAYIVPHRGHYHYI 
IEDTGDAYIVPHGDHYHYI 
IEDTGDAYIVPHGDHYHYI 
IEDTGDAYIVPHGDHYHYI 
IEDTGDAYIVPHGDHYHYI 
IEDTGDAYIVPHGDHYHYI 



.***★***.★★*★** 



■ ★*★*★**** . * 



NMQP-SQLSYSSTASD NNTQSVAKGSTSKPANKSENL 

LSNSRTYRRQNSDNTSRTNWVPSVSNPGTTNTNTSNNSNTNSQASQSNDI 
LSNSRTYRRQNSDNTSRTNWVPSVSNPGTTNTNTSNNSNTNSQASQSNDI 
LSNSRTYRRQNSDNTSRTNWVPSVSNPGTTNTNTSNNSNTNSQASQSNDI 
LSNSRTYRRQNSDNTSRTNWVPSVSNPGTTNTNTSNNSNTNSQASQSNDI 
QGSRPSSSSSHNANPAQPRLSENHNLTVTPTYHQN-QGENI 



QSLLKELYDSPSAQRYSESDGLVFDPAKIISRTPNGVAIPHGDHYHFIPY 
DSLLKQLYKLPLSQRHVESDGLIFDPAQITSRTANGVAVPHGDHYHFIPY 
DSLLKQLYKLPLSQRHVESDGLVFDPAQITSRTARGVAVPHGDHYHFIPY 
DSLLKQLYKLPLSQRHVESDGLIFDPAQITSRTANGVAVPHGDHYHFIPY 
DSLLKQLYKLPLSQRHVESDGLVFDPAQITSRTARGVAVPHGDHYHFIPY 
SSLLRELYAKPLSERHVESDGLIFDPAQITSRTANGVAVPHGDHYHFIPY 



tr t Q6WNQ5 | Q6WNQ5 STRPN 



SKLSALEEKIARMVPISGTGSTVSTNAKPNEWSSLGSLSSNPSSLTTSK 
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Q8CWR4 | Q8CWR4_STRR6 
Q8DPQ2 | Q8DPQ2_STRR6 
Q9AG74 | Q9AG7 4_STRPN 
Q9AHT9 | Q9AHT9_STRPN 
Q8DQ08 IQ8DQ08 STRR6 



Q6WNQ5 | Q6WNQ5_STRPN 
Q8CWR4 | Q8CWR4_STRR6 
Q8DPQ2 | Q8DPQ2_STRR6 
Q9AG74 | Q9AG74_STRPN 
Q9AHT9 | Q 9AHT 9_S TR PN 
Q8DQ08 | Q8DQ08 STRR6 



Q6WNQ5 | Q6WNQ5_STRPN 
Q8CWR4 | Q8CWR4_STRR6 
Q8DPQ2 | Q8DPQ2_STRR6 
Q9AG74 | Q9AG74_STRPN 
Q9AHT9 | Q9AHT9_STRPN 
Q8DQ08 IQ8DQ08 STRR6 



Q6WNQ5 | Q6WNQ5_STRPN 
Q8CWR4 | Q8CWR4_STRR6 
Q8DPQ2 | Q8DPQ2_STRR6 
Q9AG74 | Q9AG74_STRPN 
Q9AHT9 | Q9AHT9_STRPN 
Q8DQ08 | Q8DQ08 STRR6 



Q6WNQ5 | Q6WNQ5_STRPN 
Q8CWR4 | Q8CWR4_STRR6 
Q8DPQ2 | Q8DPQ2_STRR6 
Q9AG74 | Q9AG74_STRPN 
Q9AHT9 | Q 9AHT 9_S TR PN 
Q8DQ08IQ8DQ08 STRR6 



Q6WNQ5 | Q6WNQ5_STRPN 
Q8CWR4 | Q8CWR4_STRR6 
Q8DPQ2 | Q8DPQ2_STRR6 
Q9AG74 | Q9AG7 4_STRPN 
Q9AHT9 | Q9AHT9_STRPN 
Q8DQ08 | Q8DQ08 STRR6 



Q6WNQ5 | Q6WNQ5_STRPN 
Q8CWR4 | Q8CWR4_STRR6 
Q8DPQ2 | Q8DPQ2JSTRR6 
Q9AG74 | Q9AG7 4_STRPN 
Q9AHT9 | Q9AHT9_STRPN 
Q8DQ08 IQ8DQ08 STRR6 



Q6WNQ5 | Q6WNQ5_STRPN 
Q8CWR4 | Q8CWR4_STRR6 
Q8DPQ2 | Q8DPQ2_STRR6 
Q9AG74 | Q9AG7 4_STRPN 
Q9AHT9 | Q9AHT9 STRPN 



SQLS PLEEKLARI I PLRYRSNHWVPDSRP- 
SQMSELEERIARI I PLRYRSNHWVPDSRP- 
SQLSPLEEKLARI I PLRYRSNHWVPDSRP- 
SQMSELEERIARIIPLRYRSNHWVPDSRP- 
SQLS PLEEKLARI I PLRYRSNHWVPDSRP- 



EQPSPQSTPEPSPSPQPAPN 
EQPS PQPTPE PS PGPQPAPN 
EQPS PQSTPE PS PS PQPAPN 
EQPSPQPTPEPSPGPQPAPN 
EQPSPQSTPEPSPSPQPAPN 



ELSSASDGYIFNPKDIVEETATAYIVRHGDHFHYIPKSNQIGQPTLPNNS 

PQPAPS N P - - 1 DE KLVKE AVRKVGDG- - Y V FE ENGVPR - Y I PAKD 

-LKIDS N SSLVSQLVRKVGEG — YVFEEKGISR-YVFAKD 

PQPAPS NP — I DE KLVKE AVRKVGDG — YVFEENGVPR-YIPAKD 

-LKIDS N SSLVSQLVRKVGEG — YVFEEKGISR-YVFAKD 

PQPAPS NP — I DE KLVKEAVRKVGDG — YVFEENGVPR-YIPAKD 



LATPSPSLPINPGTSHEKHEEDGYGFDANRIIAEDESGFVMSHGDHNHYF 

LSAET AAGIDSKLAKQESLSHKLGAKK TD LPSSDREFYN 

LPSET VKNLESKLSKQESVSHTLTAKK EN VAPRDQEFYD 

LSAET AAGIDSKLAKQESLSHKLGAKK TD LPSSDREFYN 

LPSET VKNLESKLSKQESVSHTLTAKK EN VAPRDQEFYD 

LSAET AAGIDSKLAKQESLSHKLGAKK TD — ' LPSSDREFYN 



FKKDLTEEQIKAAQKHLEEVKTSHNGLDSLSSHEQDYPSNAKEMKDLDKK 

KAYDLLARIHQDLLDN-KGRQVDFEALDNLLERLKDVSSDKVKLVD D 

KAYNLLTEAHKALFEN-KGRNSDFQALDKLLERLNDESTNKEKLVD D 

KAYDLLARIHQDLLDN-KGRQVDFEALDNLLERLKDVSSDKVKLVD D 

KAYNLLTEAHKALFXN-KGRNSDFQALDKLLERLNDESTNKEKLVD D 

KAYDLLARIHQDLLDN-KGRQVDFEALDNLLERLKDVSSDKVKLVD: D 



IEEKIAGIMKQYGVKRESIWNKEKNAIIYPHGDHHHADPIDEHKPVGIG 

ILAFLAPIRHP ER LGKPNAQITYTD DE I QVAKLAGKY 

LLAFLAPITHP ER LGKPNSQIEYTE DEVRIAQLADKY 

ILAFLAPIRHP ER LGKPNAQITYTD DE I QVAKLAGKY 

LLAFLAPITHP ER LGKPNSQIEYTE DEVRIAQLADKY 

ILAFLAPIRHP ER LGKPNAQITYTD DE I QVAKLAGKY 

: :**: : * * * . . * : :.. 

HSHSNYELFKPEEGVAKKEGNKVYTGEELTNWNLLKNSTFNNQNFTLAN 
TTEDGY-IFDPRD-ITSDEGD-AYVTPHMTHSHWIKKDS-LSEAERAAAQ 
TTSDGY-IFDEHD-IISDEGD-AYVTPHMGHSHWIGKDS-LSDKEKVAAQ 
TTEDGY-IFDPRD-ITSDEGD-AYVTPHMTHSHWIKKDS-LSEAERAAAQ 
TTSDGY-IFDEHD-IISDEGD-AYVTPHMGHSHWIGKDS-LSDKEKVAAQ 
TTEDGY-IFDPRD-ITSDEGD-AYVTPHMTHSHWIKKDS-LSEAERAAAQ 



GQKRVSFSFPPELEKKLGINMLVKLITPDGKVLEKVSGKVFGEGVGNIAN 

AY AKE KGLT P PS TDHQDS GN TEAKGAE AI YNRVKAA KK 

AYTKEKGILPPSPDADVKAN PTGDSAAAIYNRVKGE KR 

AYAKEKGLTPPSTDHQDSGN TEAKGAEAI YNRVKAA KK 

AYTKEKGILPPSPDADVKAN PTGDSAAAIYNRVKGE KR 

AYAKEKGLTPPSTDHQDSGN TEAKGAEAI YNRVKAA KK 



FELDQPYLPGQTFKYTIASKDYPEVSYDGTFTVPTSLAYKMASQTIFYPF 

VPLDR — MP-YNLQYTVEVK NGSLIIP HYDHYHNIKFEWF 

I PLVR — LP- YMVEHTVEVK NGNLI I P HKDHYHNI KFAWF 

VPLDR — MP-YNLQYTVEVK NGSLIIP HYDHYHNIKFEWF 

I PLVR — LP- YMVEHTVEVK NGNLI I P HKDHYHNI KFAWF 
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tr IQ8DQ08 IQ8DQ08 



tr | Q6WNQ5 | Q6WNQ5 
tr|Q8CWR4|Q8CWR4 
tr i Q8DPQ2 i Q8DPQ2 
tr |Q9AG74|Q9AG74 
tr | Q9AHT9 | Q9AHT9 
tr IQ8DQ08 IQ8DQ08 



tr | Q6WNQ5 | Q6WNQ5 
tr | Q8CWR4 | Q8CWR4 
tr IQ8DPQ2 IQ8DPQ2 
tr|Q9AG74|Q9AG74 
tr | Q9AHT9 | Q9AHT9 
tr|Q8DQ08|Q8DQ08' 



tr|Q6WNQ5|Q6WNQ5 
tr|Q8CWR4|Q8CWR4 
tr IQ8DPQ2 IQ8DPQ2 
tr|Q9AG7 4|Q9AG74^ 
tr | Q9AHT9 | Q9AHT9^ 
tr | Q8DQ08 | Q8DQ08 



tr|Q6WNQ5|Q6WNQ5 
tr|Q8CWR4|Q8CWR4 
tr | Q8DPQ2 | Q8DPQ2 
tr | Q9AG74 | Q9AG7 4 
tr|Q9AHT9|Q9AHT9 
tr|Q8DQ08|Q8DQ08' 



tr|Q6WNQ5|Q6WNQ5 
tr|Q8CWR4|Q8CWR4' 
tr | Q8DPQ2 | Q8DPQ2 
tr|Q9AG74|Q9AG74^ 
tr | Q9AHT9 | Q9AHT9 
tr|Q8DQ08|Q8DQ08 



tr|Q6WNQ5|Q6WNQ5 
tr|Q8CWR4|Q8CWR4 
tr|Q8DPQ2|Q8DPQ2 
tr |Q9AG74|Q9AG74 
tr | Q9AHT9 | Q9AHT9 
tr | Q8DQ08 | Q8DQ08 



tr | Q6WNQ5 | Q6WNQ5 
tr | Q8CWR4 | Q8CWR4 
tr|Q8DPQ2|Q8DPQ2^ 
tr | Q9AG74 | Q9AG7 A 
tr | Q9AHT9 | Q9AHT9^ 
tr|Q8DQ08|Q8DQ08^ 



_STRR6 VPLDR — MP-YNLQYTVEVK NGSLIIP HYDHYHNIKFEWF 

* . . * 



STRPN HAGDT Y LRVNPQ FAV PKGTDALVR V FDE FHGNAY LENNY KVGE I KL P I PK 

STRR6 DEGLYEAPKGYSLEDLLATVKYYVE-HPNERPHSDNGFGNASDHVQR 

STRR6 DDHT Y KAPNG Y T LE D L FAT I K Y Y VE - KPDERPKS ND GW GNAS E KVL G 

STRPN DE GLYEAPKGYS LEDLLATVKY YVE - HPNE R PHSDNGFGNASDHVQR 

_STRPN DDHTYKAPNGYTLEDLFATIKYYVE-HPDERPHSNDGWGNASEHVLG 

STRR6 DE GLYEAPKGYS LEDLLATVKY YVE -HPNER PHSDNGFGNASDHVQR 



STRPN LNQGTTRTAGNKI PVTFMANAYLDNQSTY I VEVPI LEKENQTD 

STRR6 NKNGQADTNQTEKPNEEKPQTEKPEEETPREEKPQSEKPES 

STRR6 KKDHSEDPNKNFKADEE 

STRPN NKNGQADTNQTEKPNEEKPQTEKPEEETPREEKPQSEKPES 

STRPN KKDHSEDPNKNFKADEE 

STRR6 NKNGQADTNQTEKPNEEKPQTEKPEEDKEHDEVSEPTHPESDEKENHVGL 



_STRPN KPSILPQFKRNKAQENSKFDEKVEEPKTSEKVEKEKLSETGN 

_STRR6 -P KP TEEPEEESPEES — PEESEEPQVETEKVKEKLREA — 

STRR6 P VEET — PAE PEVPQVETEKVEAQLKEA — 

STRPN -P KP TEEPEEESPEES — PEESEEPQVETEKVKEKLREA — 

STRPN P VEET — PAE PEVPQVETEKVEAQLKEA — 

_STRR6 NPSADNLYKPSTDTEETEEEA-EDT — TDEAEI PQVEHSVINAKIAEA — 

★ *:: :**:... :::*: 

_STRPN STSNSTLEEVPTVDPVQEKVAKFAESYGMKLENVLFNMDGTIELYLPSGE 

JSTRR6 EDLLGKIQ — NPIIKSNAKETLT-GLK-NNLLFGTQDNNTIMAEA — 

STRR6 EVLLAKVT — DSSLKANATETLA-GLR-NNLTLQIMDNNSIMAEA — 

STRPN EDLLGKIQ — NPIIKSNAKETLT-GLK-NNLLFGTQDNNTIMAEA — 

STRPN EVLLAKVT — DS S LKANATETLA- GLR-NNLTLQ IMDNNS IMAEA — 

STRR6 EALLEKVT — DS S I RQNAVETLT- GLK- S S LLLGTKDNNTI SAEV — 

STRPN VIKKNMADFTGEAPQGNGENKPSENGKVSTGTVENQPTENKPADSLPEAP 

STRR6 — EKLLALLKESK 

STRR6 — EKLLALLKGSNPSSVSKEKIN 

STRPN — EKLLALLKESK 

STRPN — EKLLALLKGSNPSSVSKEKIN 

STRR6 — DSLLALLKESQPTPIQ 

STRPN NEKPVKPENSTDNGMLNPEGNVGSDPMLDPALEEAPAVDPVQEKLEKFTA 

STRR6 

STRR6 

STRPN 

STRPN ■ 

STRR6 

STRPN SYGLGLDSVIFNMDGTIELRLPSGEVIKKNLSDLIA 

STRR6 

STRR6 

STRPN 

STRPN 

STRR6 
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PileUp 



MSF: 


: 1086 Type: P 


Check: 


1584 














Name: 


tr|Q6WNQ5|Q6WNQ5 


_STRPN 


oo 


Len: 


1086 


Check: 


2031 


Weight : 


0. 


. 100 


Name : 


tr |Q8CWR4|Q8CWR4' 


_STRR6 


oo 


Len : 


1086 


Check : 


2995 


Weight: 


0. 


.100 


Name: 


tr|Q8DPQ2|Q8DPQ2^ 


_STRR6 


oo 


Len: 


1086 


Check: 


7473 


Weight : 


0. 


.100 


Name : 


tr |Q9AG74|Q9AG74" 


_STRPN 


oo 


Len: 


1086 


Check: 


1008 


Weight : 


0. 


.100 


Name : 


tr | Q9AHT9 | Q9AHT9~ 


_STRPN 


oo 


Len: 


1086 


Check: 


5019 


- Weight : 


0, 


, 100 


Name : 


tr | Q8DQ08 | Q8DQ08 


_STRR6 


oo 


Len: 


1086 


Check: 


3058 


Weight : 


0. 


, 100 



// 



tr | Q6WNQ5 | Q6WNQ5_STRPN 
tr | Q8CWR4 | Q8CWR4_STRR6 
tr | Q8DPQ2 | Q8DPQ2_STRR6 
tr | Q9AG74 | Q9AG74_STRPN 
tr | Q9AHT9 | Q9AHT9_STRPN 
tr IQ8DQ08 IQ8DQ08 STRR6 



CAYALNQHR SQENK.DNNR 

. MNQIYLRKE ERMKINKKYL AGSVATLVLS VCAYELGLHQ AQTVK . ENNR 
MQLEISNRKR VSMKINKKYL VGSAAALILS VCSYELGLYQ ARTVK . ENNR 

MKINKKYL VGSAAALILS VCSYELGLYQ ARTVK. ENNR 

MKINKKYL VGSAAALILS VCSYELGLYQ ARTVK. ENNR 

MKINKKYL AGSVAVLALS VCSYELGRHQ AGQVKKESNR 



tr|Q6WNQ5|Q6WNQ5_STRPN VSYVDGSQSS QKSENLTPDQ VSQKEGIQAE QIVIKITDQG YVTSHGDHYH 

tr|Q8CWR4|Q8CWR4_STRR6 VSYIDGKQAT QKTENLTPDE VSKREGINAE QIVIKITDQG YVTSHGDHYH 

tr | Q8DPQ2 | Q8DPQ2_STRR6 VSYIDGKQAT QKTENLTPDE VSKREGINAE QIVIKITDQG YVTSHGDHYH 

tr IQ9AG74 |Q9AG7 4_STRPN VSYIDGKQAT QKTENLTPDE VSKREGINAE QIVIKITDQG YVTSHGDHYH 

tr | Q9AHT9 | Q9AHT9_STRPN VSYIDGKQAT QKTENLTPDE VSKREGINAE QIVIKITDQG YVTSHGDHYH 

tr | Q8DQ08 | Q8DQ08_STRR6 VSYIDGDQAG QKAENLTPDE VSKREGINAE QIVIKITDQG YVTSHGDHYH 



tr|Q6WNQ5|Q6WNQ5_STRPN YYNGKVPYDA LFSEELLMKD PNYQLKDADI VNEVKGGYII KVDGKYYVYL 

tr|Q8CWR4|Q8CWR4_STRR6 YYNGKVPYDA IISEELLMKD PNYQLKDEDI ISEIKGGYVI KVDGKYYVYL 

tr | Q8DPQ2 | Q8DPQ2_STRR6 YYNGKVPYDA IFSEELLMKD PNYKLKDEDI VNEVKGGYVI KVDGKYYVYL 

tr|Q9AG74|Q9AG74_STRPN YYNGKVPYDA IISEELLMKD PNYQLKDEDI ISEIKGGYVI KVDGKYYVYL 

tr | Q9AHT9 | Q9AHT9_STRPN YYNGKVPYDA IISEELLMKD PNYKLKDEDI VNEVKGGYVI KVDGKYYVYL 

tr | Q8DQ08 | Q8DQ08_STRR6 YYNGKVPYDA IISEELLMKD PNYQLKDSDI VNEIKGGYVI KVDGKYYVYL 



tr|Q6WNQ5|Q6WNQ5_STRPN KDAAHADNVR TKDEINRQKQ EHVKDNE . . . . KVNSNVAVA 

tr IQ8CWR4 |Q8CWR4_STRR6 KDAAHADNVR TKEEINRQKQ EHSQHREGGT PRND GAVALA 

tr | Q8DPQ2 | Q8DPQ2_STRR6 KDAAHADNVR TKEEINRQKQ EHSQHREGGT PRND GAVALA 

tr|Q9AG7 4|Q9AG7 4_STRPN KDAAHADNVR TKEEINRQKQ EHSQHREGGT P RND GAVALA 

tr | Q9AHT9 | Q9AHT9_STRPN KDAAHADNVR TKEEINRQKQ EHSQHREGGT P RND GAVALA 

tr|Q8DQ08|Q8DQ08_STRR6 KDAAHADNIR TKEEIKRQKQ ERSHNHN... SRADNAVAAA 



RSQGRYTTND 
RSQGRYTTDD 
RSQGRYTTDD 
RSQGRYTTDD 
RSQGRYTTDD 
RAQGRYTTDD 



tr | Q6WNQ5 | Q6WNQ5_STRPN GYVFNPADII EDTGNAYIVP HRGHYHYIPK SDLSASELAA AKAHLAGK. . 

tr|Q8CWR4|Q8CWR4_STRR6 GYIFNASDII EDTGDAYIVP HGDHYHYIPK NELSASELAA AKAFLSGRGN 

tr | Q8DPQ2 | Q8DPQ2_STRR6 GYIFNASDII EDTGDAYIVP HGDHYHYIPK NELSASELAA AEAFLSGRGN 

tr|Q9AG74|Q9AG74_STRPN GYIFNASDII EDTGDAYIVP HGDHYHYIPK NELSASELAA AKAFLSGRGN 

tr | Q9AHT9 | Q9AHT9__STRPN GYIFNASDII EDTGDAYIVP HGDHYHYIPK NELSASELAA AEAFLSGRGN 

tr|Q8DQ08|Q8DQ08_STRR6 GYIFNASDII EDTGDAYIVP HGDHYHYIPK SDLSASELAA AQAYWNGK. . 



tr|Q6WNQ5|Q6WNQ5_STRPN NMQP.SQLSY SSTASD...N NTQSVAKGST SKPANKSENL 

tr|Q8CWR4|Q8CWR4_STRR6 LSNSRTYRRQ NSDNTSRTNW VPSVSNPGTT NTNTSNNSNT NSQASQSNDI 

tr|Q8DPQ2|Q8DPQ2_STRR6 LSNSRTYRRQ NSDNTSRTNW VPSVSNPGTT NTNTSNNSNT NSQASQSNDI 
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tr | Q9AG7 4 | Q9AG7 4_STRPN LSNSRTYRRQ NSDNTSRTNW VPSVSNPGTT NTNTSNNSNT NSQASQSNDI 
tr | Q9AHT9 | Q9AHT9_STRPN LSNSRTYRRQ NSDNTSRTNW VPSVSNPGTT NTNTSNNSNT NSQASQSNDI 
tr | Q8DQ08 | Q8DQ08_STRR6 Q GSRPSSSSSH NANPAQPRLS ENHNLTVTPT YHQN.QGENI 



t r i Q 6WNQ 5 | Q 6WNQ 5_S T R PN Q3LLKELYDS PSAQRYSESD GLVFDPAKII SRTPNGVAIP HGDKYKFIPY 

tr |Q8CWR4|Q8CWR4_STRR6 DSLLKQLYKL PLSQRHVESD GLIFDPAQIT SRTANGVAVP HGDHYHFIPY 

tr | Q8DPQ2 | Q8DPQ2_STRR6 DSLLKQLYKL PLSQRHVESD GLVFDPAQIT SRTARGVAVP HGDHYHFIPY 

tr IQ9AG74 |Q9AG7 4_STRPN DSLLKQLYKL PLSQRHVESD GLIFDPAQIT SRTANGVAVP HGDHYHFIPY 

tr | Q9AHT9 | Q9AHT9_STRPN DSLLKQLYKL PLSQRHVESD GLVFDPAQIT SRTARGVAVP HGDHYHFIPY 

tr | Q8DQ08 | Q8DQ08_STRR6 SSLLRELYAK PLSERHVESD GLIFDPAQIT SRTANGVAVP HGDHYHFIPY 



tr|Q6WNQ5|Q6WNQ5_STRPN SKLSALEEKI ARMVPISGTG STVSTNAKPN EWSSLGSLS 

tr IQ8CWR4 |Q8CWR4_STRR6 SQLSPLEEKL ARIIPLRYRS NHWVPDSRP. EQPSPQSTPE 

tr|Q8DPQ2|Q8DPQ2_STRR6 SQMSELEERI ARIIPLRYRS NHWVPDSRP. EQPSPQPTPE 

tr |Q9AG74|Q9AG74_STRPN SQLSPLEEKL ARIIPLRYRS NHWVPDSRP. EQPSPQSTPE 

tr | Q9AHT9 | Q9AHT9_STRPN SQMSELEERI ARIIPLRYRS NHWVPDSRP. EQPSPQPTPE 

tr IQ8DQ08 |Q8DQ08_STRR6 SQLSPLEEKL ARIIPLRYRS NHWVPDSRP. EQPSPQSTPE 



SNPSSLTTSK 
PSPSPQPAPN 
PSPGPQPAPN 
PSPSPQPAPN 
PSPGPQPAPN 
PSPSPQPAPN 



tr | Q6WNQ5 | Q6WNQ5_STRPN 
tr | Q8CWR4 | Q8CWR4_STRR6 
tr | Q8DPQ2 | Q8DPQ2_STRR6 
t r | Q9AG7 4 | Q9AG7 4_STRPN 
tr | Q9AHT9 | Q9AHT9_STRPN 
tr|Q8DQ08|Q8DQ08 STRR6 



ELSSASDGYI FNPKDIVEET ATAYIVRHGD 



PQPAPS . 
. LKIDS . 
PQPAPS . 
.LKIDS. 
PQPAPS . 



HFHYIPKSNQ IGQPTLPNNS 
. NP . . IDEKL VKEAVRKVGD G..YVFEENG VPR.YIPAKD 

.N SSL VSQLVRKVGE G. . YVFEEKG ISR.YVFAKD 

.NP.. IDEKL VKEAVRKVGD G. . YVFEENG VPR.YIPAKD 

-N SSL VSQLVRKVGE G. . YVFEEKG ISR.YVFAKD 

.NP.. IDEKL VKEAVRKVGD G. . YVFEENG VPR.YIPAKD 



tr | Q6WNQ5 | Q6WNQ5_STRPN 
tr | Q8CWR4 | Q8CWR4_STRR6 
tr | Q8DPQ2 | Q8DPQ2_STRR6 
tr | Q9AG74 | Q9AG7 4_STRPN 
tr | Q9AHT9 | Q9AHT9_STRPN 
tr|Q8DQ08|Q8DQ08 STRR6 



LATPSPSLPI NPGTSHEKHE EDGYGFDANR IIAEDESGFV MSHGDHNHYF 

LSAET...AA GIDSKLAKQE SLSHKLGAKK . . . TD LPSSDREFYN 

LPSET...VK NLESKLSKQE SVSHTLTAKK ...EN VAPRDQEFYD 

LSAET...AA GIDSKLAKQE SLSHKLGAKK . . . TD LPSSDREFYN 

LPSET...VK NLESKLSKQE SVSHTLTAKK ...EN VAPRDQEFYD 

LSAET...AA GIDSKLAKQE SLSHKLGAKK . . . TD LPSSDREFYN 



tr|Q6WNQ5|Q6WNQ5_STRPN FKKDLTEEQI KAAQKHLEEV 

tr |Q8CWR4|Q8CWR4_STRR6 KAYDLLARIH QDLLDN.KGR 

tr|Q8DPQ2|Q8DPQ2_STRR6 KAYNLLTEAH KALFEN.KGR 

tr|Q9AG74|Q9AG74_STRPN KAYDLLARIH QDLLDN.KGR 

tr | Q9AHT9 | Q9AHT9_STRPN KAYNLLTEAH KALFXN.KGR 

tr|Q8DQ08|Q8DQ08 STRR6 KAYDLLARIH QDLLDN.KGR 



KTSHNGLDSL 
QVDFEALDNL 
NSDFQALDKL 
QVDFEALDNL 
NSDFQALDKL 
QVDFEALDNL 



SSHEQDYPSN AKEMKDLDKK 



LERLKDVSSD 
LERLNDESTN 
LERLKDVSSD 
LERLNDESTN 
LERLKDVSSD 



KVKLVD . 
KEKLVD . 
KVKLVD . 
KEKLVD. 
KVKLVD . 



.D 
-D 
.D 
,D 
.D 



tr|Q6WNQ5|Q6WNQ5_ 


_STRPN 


IEEKIAGIMK 


QYGVKRESIV 


VNKEKNAIIY 


PHGDHHHADP 


IDEHKPVGIG 


tr | Q8CWR4 | Q8CWR4] 


STRR6 


ILAFLAPIRH 


P. . . 


ER. . . . 


LGKPNAQITY 


TD, 


. . .DE 


IQVAKLAGKY 


tr|Q8DPQ2|Q8DPQ2] 


_STRR6 


LLAFLAPITH 


P. . . 


ER. . . . 


LGKPNSQIEY 


TE. , 


. . .DE 


VRIAQLADKY 


tr |Q9AG74|Q9AG74~ 


_STRPN 


. ILAFLAPIRH 


P. . . 


ER. . . . 


LGKPNAQITY 


TD. , 


. . .DE 


IQVAKLAGKY 


tr | Q9AHT9 | Q9AHT9~ 


~STRPN 


LLAFLAPITH 


P. . . 


ER 


LGKPNSQIEY 


TE. . , 


. . .DE 


VRIAQLADKY 


tr|Q8DQ08|Q8DQ08~ 


~STRR6 


ILAFLAPIRH 


P. . . 


ER. . . . 


LGKPNAQITY 


TD. . . 


. . .DE 


IQVAKLAGKY 


tr | Q6WNQ5 | Q6WNQ5_ 


_STRPN 


HSHSNYELFK 


PEEGVAKKEG 


NKVYTGEELT 


NWNLLKNST 


FNNQNFTLAN 


tr | Q8CWR4 | Q8CWR4* 


J5TRR6 


TTEDGY. IFD 


PRD. 


ITSDEG 


D.AYVTPHMT 


HSHWIKKDS . 


LSEAERAAAQ 


tr|Q8DPQ2|Q8DPQ2~ 


~STRR6 


TTSDGY. IFD 


EHD. 


IISDEG 


D.AYVTPHMG 


HSHWIGKDS. 


LSDKEKVAAQ 


tr |Q9AG74|Q9AG74" 


"STRPN . 


TTEDGY. IFD 


PRD. 


ITSDEG 


D.AYVTPHMT 


HSHWIKKDS. 


LSEAERAAAQ 


tr | Q9AHT9 | Q9AHT9_ 


"STRPN 


TTSDGY. IFD 


EHD. 


IISDEG 


D.AYVTPHMG 


HSHWIGKDS. 


LSDKEKVAAQ 


tr|Q8DQ08|Q8DQ08_ 


_STRR6 


TTEDGY. IFD 


PRD. 


ITSDEG 


D.AYVTPHMT. 


HSHWIKKDS. 


LSEAERAAAQ 
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tr 


Q6WNQ5 


Q6WNQ5_ 


_STRPN 


GQKRVSFSFP 


PELEKKLGIN 


MLVKLITPDG 


KVLEKVSGKV 


FGEGVGNIAN 


tr 


Q8CWR4 


Q8CWR4~ 


J5TRR6 


AYAKEKGLTP 


PSTDHQDSGN 


TEA 


KGAEAI YNRV 


KAA KK 


tr 


Q8DPQ2 


Q8DPQ2 


_STRR6 


AYTKEKGILP 


PSPDADVKAN 


PTG 


DSAAAI YNRV 


KGE KR 


tr 


Q9AG74 


Q9AG7 4 


_STRPN 


AYAKEKGLTP 


PSTDHQDSGN 


TEA 


KGAEAI YNRV 


KAA KK 


tr 


Q 9 ART 9 


Q9AHT9^ 


J3TRPN 


AYTKEKGILP 


PS PDADVKAN 


PTG 


DSAAAI YNRV 


KGE KR 


tr 


Q8DQ08 


QSDQOB 


_STRR6 


AYAKEKGLTP 


PSTDHQDSGN 


TEA 


KGAEAI YNRV 


KAA KK 


tr 


Q6WNQ5 


Q6WNQ5 


_STRPN 


FELDQPYLPG 


QTFKYTIASK 


DYPEVSYDGT 


FTVPTSLAYK 


MASQTIFYPF 


tr 


Q8CWR4 


Q8CWR4 


_STRR6 


VPLDR . . MP . 


YNLQYTVEVK 


NGS 


LIIP. . .HYD 


HYHNIKFEWF 


tr 


Q8DPQ2 


Q8DPQ2 


_STRR6 


IPLVR. .LP. 


YMVEHTVEVK 


NGN 


LIIP. . . HKD 


HYHNIKFAWF 


tr 


Q9AG74 


Q9AG74 


_STRPN 


VPLDR. .MP. 


YNLQYTVEVK 


NGS 


LIIP. . .HYD 


HYHNIKFEWF 


tr 


Q9AHT9 


Q9AHT9 


JSTRPN 


IPLVR. . LP. 


YMVEHTVEVK 


NGN 


LIIP. . .HKD 


HYHNIKFAWF 


tr 


Q8DQ08 


Q8DQ08 


_STRR6 


VPLDR . . MP . 


YNLQYTVEVK 


NGS 


LIIP. . .HYD 


HYHNIKFEWF 


tr 


Q6WNQ5 


Q6WNQ5 


_STRPN 


HAGDTYLRVN 


PQFAVPKGTD 


ALVRVFDEFH 


GNAYLENNYK 


VGEIKLPIPK 


tr 


Q8CWR4 


Q8CWR4 


_STRR6 


. . .DEGLYEA 


PKGYSLEDLL 


ATVKYYVE . H 


PNERPHSDNG 


FGNASDHVQR 


tr 


Q8DPQ2 


Q8DPQ2^ 


_STRR6 


. . . DDHTYKA 


PNGYTLEDLF 


ATIKYYVE . H 


PDERPHSNDG 


WGNASEHVLG 


tr 


Q9AG74 


Q9AG74' 


_STRPN 


. . .DEGLYEA 


PKGYSLEDLL 


ATVKYYVE . H 


PNERPHSDNG 


FGNASDHVQR 


tr 


Q9AHT9 


Q9AHT9* 


_STRPN 


. . . DDHTYKA 


PNGYTLEDLF 


ATIKYYVE . H 


PDERPHSNDG 


WGNASEHVLG 


tr 


Q8DQ08 


Q8DQ08~ 


_STRR6 


. . .DEGLYEA 


PKGYSLEDLL 


ATVKYYVE . H 


PNERPHSDNG 


FGNASDHVQR 


tr 


Q6WNQ5 


Q6WNQ5 


_STRPN 


LNQGTTRTAG 


NKIPVTFMAN 


AYLDNQSTYI 


VEVPILEKEN 


QTD 


tr 


Q8CWR4 


Q8CWR4" 


_STRR6 


NKNGQADTNQ 


TEKPNEEKPQ 


TEKPEEETPR 


EEKPQSEKPE 


S 


tr 


Q8DPQ2 


Q8DPQ2 


_STRR6 


KKDHSEDPNK 


NFKADEE . . . 








tr 


Q9AG74 


Q9AG74" 


_STRPN 


NKNGQADTNQ 


TEKPNEEKPQ 


TEKPEEETPR 


EEKPQSEKPE 


s 


tr 


Q9AHT9 


Q9AHT9~ 


_STRPN 


KKDHSEDPNK 


NFKADEE . . . 








tr 


Q8DQ08 


Q8DQ08~ 


_STRR6 


NKNGQADTNQ 


TEKPNEEKPQ 


TEKPEEDKEH 


DEVSEPTHPE 


SDEKENHVGL 


tr 


Q6WNQ5 


Q6WNQ5_ 


_STRPN 


KP 


SILPQFKRNK 


AQENSKFDEK 


VEEPKTSEKV 


EKEKLSETGN 


tr 


Q8CWR4 


Q8CWR4^ 


_STRR6 


.P KP 


. . . TEEPEEE 


SPEES. . PEE 


SEEPQVETEK 


VKEKLREA. . 


tr 


Q8DPQ2 


Q8DPQ2* 


_STRR6 


P 




.VEET. . PAE 


PEVPQVETEK 


VEAQLKEA. . 


tr 


Q9AG74 


Q9AG7 4" 


_STRPN 


.P KP 


. . .TEEPEEE 


SPEES. .PEE 


SEEPQVETEK 


VKEKLREA. . 


tr 


Q9AHT9 


Q9AHT9_ 


_STRPN 


P 




.VEET. . PAE 


PEVPQVETEK 


VEAQLKEA. . 


tr 


Q8DQ08 


Q8DQ08^ 


_STRR6 


NPSADNLYKP 


STDTEETEEE 


A.EDT. . TDE 


AEIPQVEHSV 


INAKIAEA. . 


tr 


Q6WNQ5 


Q6WNQ5_ 


_STRPN 


STSNSTLEEV 


PTVDPVQEKV 


AKFAESYGMK 


LENVLFNMDG 


TIELYLPSGE 


tr 


Q8CWR4 


Q8CWR4 


_STRR6 


. . . EDLLGKI 


Q. .NPIIKSN 


AKETLT . GLK 


. NNLLFGTQD 


NNTIMAEA. . 


tr 


Q8DPQ2 


Q8DPQ2_ 


_STRR6 


. . . EVLLAKV 


T. .DSSLK7\N 


ATETLA.GLR 


.NNLTLQIMD 


NNSIMAEA. . 


tr 


Q9AG74 


Q9AG7 4 


_STRPN 


. . . EDLLGKI 


Q. .NPIIKSN 


AKETLT . GLK 


.NNLLFGTQD 


NNTIMAEA. . 


tr 


Q9AHT9 


Q9AHT9" 


_STRPN 


. . . EVLLAKV 


T. . DSSLKAN 


ATETLA.GLR 


.NNLTLQIMD 


NNSIMAEA. . 


tr 


Q8DQ08 


Q8DQ08_ 


_STRR6 


. . . EALLEKV 


T. . DSSIRQN 


AVETLT . GLK 


. SSLLLGTKD 


NNTISAEV. . 


tr 


Q6WNQ5 


Q6WNQ5_ 


_STRPN 


VIKKNMADFT 


GEAPQGNGEN 


KPSENGKVST 


GTVENQPTEN 


KPADSLPEAP 


tr 


Q8CWR4 


Q8CWR4' 


_STRR6 


. . EKLLALLK 


ESK 








tr 


Q8DPQ2 


Q8DPQ2" 


_STRR6 


. . EKLLALLK 


GSNPSSVSKE 


KIN 






tr 


Q9AG74 


Q9AG7 4^ 


J3TRPN 


. .EKLLALLK 


ESK 








tr 


Q9AHT9 


Q9AHT9_ 


_STRPN 


. . EKLLALLK 


GSNPSSVSKE 


KIN 






tr 


Q8DQ08 


Q8DQ08 


_STRR6 


. . DSLLALLK 


ESQPTPIQ. . 








tr 


Q6WNQ5 


Q6WNQ5_ 


_STRPN 


NEKPVKPENS 


TDNGMLNPEG 


NVGSDPMLDP 


ALEEAPAVDP 


VQEKLEKFTA 


tr 


Q8CWR4 


Q8CWR4 


_STRR6 












tr 


Q8DPQ2 


| Q8DPQ2 


STRR6 
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tr 
tr 
tr 



IQ9AG74 | 
| Q9AHT9 | 
IQ8DQ08 f 



Q9AG7 4 
Q9AHT9 
Q8DQ08 



STRPN 
STRPN 
STRR6 



tr 



i Q6WNQ5 | 



Q6WNQ5 



STRPN 



SYGLGLDSVI FNMDGTIELR LPSGEVIKKN LSDLIA 



tr | Q8CWR4 | Q8CWR4_STRR6 
tr | Q8DPQ2 | Q8DPQ2_STRR6 
tr | Q9AG74 | Q9AG7 4_STRPN 
tr | Q9AHT9 | Q 9AHT 9_STR PN 
tr | Q8DQ08 | Q8DQ08_STRR6 
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tr Q6WNQ7 Surface protein BVH-3 [bvh-3] [Streptococcus 1039 

Q6WNQ7_STRPN pneumoniae] AA 

align 

V^JLore = 1134 bits (2933), Expect =0.0 
^ Identities = 565/567 (99%), Positives = 565/567 (99%) 



^\puery : 1 LTEEQIKAAQKHLEEVKTSHNGLDSLSSHEQDYPSNAKEMKDLDKKIEEKIAGIMKQYGV 60 

|^7^ b j ct : 473 LTEEQIKAAQKHLEEVKTSHNGLDSLSSHEQDYPGNAKEMKDLDKKIEEKIAGIMKQYGV 532 

Query: 61 KRESIWNKEKNAIIYPHGDHHHADPIDEHKPVGIGHSHSNYELFKPEEGVAKKEGNKVY 12 0 

Sbjct: 533 KRESIWNKEKNAIIYPHGDHHHADPIDEHKPVGIGHSHSNYELFKPEEGVAKKEGNKVY 592 

Query: 121 TGEELT^AA^NLLKNSTFNNQNFTLANGQKRVSFSFPPELEKKLGINMLVKLITPDGKVLE 180 

T GE E LTN VVNLLKM 3 T FN NQN FT LAJ\f GQKRVS FS F P PE LE KKLGI NMLVKL I T PDGKV LE 
Sbjct : 593 T GE E LTNWNLLKNS T FNNQN FT LANGQKRVS FS F P PE LE KKLGI NMLVKL I T PDGKVLE 652 

Query: 181 KVSGKVFGEGVGNIANFELDQPYLPGQTFKYTIASKDYPEVSYDGTFTVPTSLAYKMASQ 240 

KV3 GKVFGE GVGNI ANFE LDQ P YL PGQT FKYTI A3 KDY PE V5 YDGT FT VFTS LA YfCMASQ 
Sbjct : 653 KVSGKVFGEGVGNIANFELDQPYLPGQTFKYTIASKDYPEVSYDGTFTVPTSLAYKMASQ 712 

Query: 241 TIFYPFHAGDTYLRVNPQFAVPKGTDALVRVFDEFHGNAYLENNYKVGEIKLPIPKLNQG 300 

Sbjct : 713 TIFYPFHAGDTYLRVNPQFAVPKGTDALVRVFDEFHGNAYLENNYKVGEIKLPIPKLNQG 772 

Query: 301 TTRTAGNKIPVTFMANAYLDNQSTYIVEVPILEKENQTDKPSILPQFKRNKAQENLKLDE 360 

TTPxAGNKIPV^FMANAYLDNQS? Y'IVEVPIXj^KENQTDKPSXLPQFKHNKAQ^N KLDE 
Sbjct: 773 TTRTAGNKIPVTFMANAYLDNQSTYIVEVPILEKENQTDKPSILPQFKRNKAQENSKLDE 832 

Query: 361 KVEEPKTSEKVEKEKLSETGNSTSNSTLEEVPTVDPVQEKVAKFAESYGMKLENVLFNMD 420 

Sbjct: 833 KVEEPKTSEKVEKEKLSETGNSTSNSTLEEVPTVDPVQEKVAKF^SYGMKLENVLFNMD 892 

Query: 421 GTIELYLPSGEVIKKNMADFTGEAPQGNGENKPSENGKVSTGTVENQPTENKPADSLPEA 48 0 

Sbjct: 893 GTIELYLPSGEVIKKNMADFTGEAPQGNGENKPSENGKVSTGTVENQPTENKPADSLPEA 952 

Query: 481 PNE KPVKPENS TDNGMLNPE GNVGS D PMLD PALE E APAVD PVQE KLE KFTAS Y GLGLDS V 540 

FNS KPVK PENS TON GMLN P E GNVGS D PMLD PALE E APAVD FV QE K LE KFTAS Y GL GLD3 V 
Sbjct : 953 PNE K PVK PE NS TDNGMLNPE GNVGS D PMLD PALE E APAVD PVQE KLE KFTAS Y GLGLDS V 1012 

Query: 541 IFNMDGTIELRLPSGEVIKKNLSDLIA 567 

XF^MDGTI^LRLPSGEVIKKNLSDLIA 
Sbjct: 1013 IFNMDGTIELRLPSGEVIKKNLSDLIA 1039 
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Q6WNQ5 Surface protein BVH-3 (Fragment) [bvh-3] [Streptococcus 1019 

Q6WNQ5_STRPN pneumoniae] AA 

align 



Score = 1504 bits (3893), Expect =0.0 

Identities = 743/779 (95%), Positives = 743/779 (95%) 

^ Query: 1 AYALNQHRSQENKDNNRVS YVDGSQS SQKSENLT PDQVSQKE GI QAEQ I VI KI TDQGYVT 60 

AY ALNQ HR S QSNKDNNR V S Y VO OS Q S S QKS E N LT PDQVS QKE G I QAE Q IV 1" K I T DQ GY V T 

Sb ^ ct : 2 ^ AY ^ N Q HRS Q ENKDNNRVSYV ^ 

Query: 61 SHGDHYHYYNGKVPYDALFSEELLMKDPNYQLKDADIVNEVKGGYIIKVDGKYYVYLKDA 120 

SKGDKYKYSKGKVPYDAL.F^ 

Sbjct: 62 SHGDHYHYYNGKVPYDALFSEELLMKDPNYQLKDADIVNEVKGGYIIKVDGKYYVYLKDA J^f / n 19 

Query: 121 AHADNVRTKDEINRQKQEHVKDNEKVNSNVAVARSQGRYTTNDGYVFNPADIIEDTGNAY 180 

AH ADNVRT KDE I NRQKQE- HVKDHE! KVN 3 NV. AVAR 3 Q GR YTTNDGY V FN PAD 1 1 E DT GNAY - 
Sbjct: 122 AHADNVRTKDEINRQKQEHVKDNEKVNSNVAVARSQGRYTTNDGYVFNPADIIEDTGNAY J^T ^ P 

Query: 181 IVPHGGHYHYIPXXXXXXXXXXXXXXXXXXXNMQPSQLSYSSTASDNNTQSVAKGSTSKP 240 

IVPH GHYHYIP NMQPSQLSYSSTASDNNTQSVAKG3TSKP 
Sbjct: 182 IVPHRGHYHYIPKSDLSASELA7\AKAHLAGKNMQPSQLSYSSTASDNNTQSVAKGSTSKP frff 

Query: 241 ANKSENLQSLLKELYDSPSAQRYSESDGLVFDPAKIISRTPNGVAIPHGDHYHFIPYSKL 300 

Sbjct: 242 ANKSENLQSLLKELYDSPSAQRYSESDGLVFDPAKIISRTPNGVAIPHGDHYHFIPYSKL >0l^3^F 

Query: 301 SALEEKIARMVPISGTGSTVSTNAKPNEWXXXXXXXXXXXXXXXXKELSSASDGYIFNP 360 

3 ALEEKI ARMV PIS GT GS TVS TNAK FN S W KE LS SASDGY I FN P , ~ 

Sbjct: 302 SALEEKI7^RMVPISGTGSTVSTNAKPNEWSSLGSLSSNPSSLTTSKELSSASDGYIFNP 

Query: 361 KDIVEETATAYIVRHGDHFHYIPKSNQIGQPTLPNNSLATPSPSLPINPGTSHEKHEEDG 420 

KDIVEFrTAT7\YIVRKGDHFHYIF „ 
Sbjct: 362 KDIVEETATAYIVRHGDHFHYIPKSNQIGQPTLPNNSLATPSPSLPINPGTSHEKHEEDG $2?f 

Query: 421 YGFDANRIIAEDESGFVMSHGDHNHYFFKKDLTEEQIKAAQKHLEEVKTSHNGLDSLSSH 480 

Y GFFANR 1 1 F*E DE 5i GF\^4S HGDKNH Y F FKKDLTE E Q I KAAQKI-JLE E VKTS HNGLD8 LS 3 H 
Sbjct: 422 YGFDANRIIAEDESGFVMSHGDHNHYFFKKDLTEEQIKAAQKHLEEVKTSHNGLDSLSSH £#f tyO® 

Query: 481 EQDYPSNAKEMKDLDKKIEEKIAGIMKQYGVKRESIWNKEKNAIIYPHGDHHHADPIDE 540 

EQL> Y PSNAKEKK7>LDKKI £E KI AGIMKQ V G VKRE S I VV'NKE KNA XI Y PKGDHHHAD P I DE . 
Sbjct': 482 EQDYPSNAKEMKDLDKKIEEKIAGIMKQYGVKRESIWNKEKNAIIYPHGDHHHADPIDE pAl y&O 

Query: 541 HKPVGIGHSHSNYELFKPEEGVAKKEGNKVYTGEELTNWNLLKNSTFNNQNFTLANGQK 600 

Sbjct:. 542 HKPVGIGHSHSNYELFKPEEGVAKKEGNKVYTGEELTNWNLLKNSTFNNQNFTLANGQK ^ 

Query: 601 RVSFSFPPELEKKLGINMLVKLITPDGKVLEKVSGKVFGEGVGNIANFELDQPYLPGQTF 660 

R V 3 PS F P PE LE KKL GI NMLVKL I T PD GKV L E K V S GKVF GE GV GN I AN FE LDQ P Y L P GQT F 
Sbjct: 602 RVSFSFPPELEKKLGINMLVKLITPDGKVLEKVSGKVFGEGVGNIANFELDQPYLPGQTF 6*1 ((? 

Query: 661 KYTIASKDYPEVSYDGTFTVPTSLAYKMASQTIFYPFHAGDTYLRVNPQFAVPKGTDALV 720 

K Y T I AS KD Y ?E VS Y DGT FT V PT 3 LAY KMAS QT I FY P FH AGDT Y LP. VN ?Q FAV PKGT D7\LV 
Sbjct: 662 KYTI^KDYPEVSYDGTFTVPTSLAYKMASQTIFYPFHAGDTYLRVNPQFAVPKGTDALV l^fi 7 

Query: 721 RVFDEFHGNAYLENNYKVGEIKLPIPKLNQGTTRTAGNKIPVTFMANAYLDNQSTYIVE 77 9 

P.VF.DE FKGNAY LENNY ?~/GE I KI.-PI PKLMQGTTP.TAGNKI PVTFMANAYLDNQSTY IVS 
Sbjct: 722 RVFDEFHGNAYLENNYKVGEIKLPIPKLNQGTTRTAGNKI PVTFMANAYLDNQSTYIVE £|J7) 



http://tw.expasy.org/cgi-bin7blast.pl 



6/21/05 



ExPASy BLAST2 Interface 



Page 1 of 1 



tr Q6WNQ7 Surface protein BVH-3 [bvh-3] [Streptococcus 1039 

Q6WNQ7_STRPN pneumoniae] AA 

align 

Score = 475 bits (1222), Expect = e-133 

Identities = 239/240 (99%), Positives = 239/240 (99%) 

OY^^^Query: 1 EVPILEKENQTDKPSILPQFKRNKAQENLKLDEKVEEPKTSEKVEKEKLSETGNSTSNST 60 

^VPTL^Kj^NQTDKPSILPQFKRNKA^N K'VEft PKT S & K V & Kf ■ KL S E TGN S TS N 3 T 

3^Sbjct: 800 EVPILEKENQTDKPSILPQFKRNKAQENSKLDEKVEEPKTSEKVEKEKLSETGNSTSNST 859 

Query: 61 LEEVPTVDPVQEKVAKFAESYGMKLEWLFNMDGTIELYLPSGEVIKKNMADFTGEAPQG 12 0 

L r.VP.V PT VD PV QF. K VRKV&K. S V GMKLSNVL FNMD GT I & L Y L P S G F. V I KKNM AD FTG£ APQ 0 
Sbjct: 860 LEEVPTVDPVQEKVAKFAESYGMKLENVLFNMDGTIELYLPSGEVIKKNMADFTGEAPQG 919 



Query: 121 NGENKPSENGKVSTGTVENQPTENKPADSLPEAPNEKPVKPENSTDNGMLNPEGNVGSDP 180 

KGE NKP 3 EKGK V S T GT V£ NQ PT S NKPAD S L PEAP N E K PVK PS N ST DNGMLN PE GNYG S D P 
Sbjct : 920 NGENKPSENGKVSTGTVENQPTENKPADSLPEAPNEKPVKPENSTDNGMLNPEGNVGSDP 97 9 

Query : 181 MLDPALEEAPAVDPVQEKLEKFTASYGLGLDSVI FNMDGT I E LRL PS GE VI KKNLSDLI A 240 

MLD ?7\X:E E AFAVD PVQEKLE K FT A S Y GLGLDS VI FNMDG? I ELRL PS GE V I KKWLSDL I A 
Sbjct : 98 0 MLDPALEEAPAVDPVQEKLEKFTASYGLGLDSVIFNMDGTIELRLPSGEVI KKNLSDLI A 1039 
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tr Q6WNQ7 Surface protein BVH-3 [bvh-3] [Streptococcus 1039 

Q6WNQ7_STRPN pneumoniae] AA 

align 

Score = 1059 bits (2738), Expect =0,0 

Identities = 527/528 (99%), Positives = 527/528 (99%) 



QVyV w Query: 1 MKDLDKKI EEKIAGIMKQYGVKRES I WNKEKNAI I Y PHGDHHHADPI DEHKPVGI GHSH 60 

^ MK.l;LDKK.T. & .£ X X AG I MKQ Y GV KRE 3 1 W N KE KKAX X Y PR G OKHHAT) P I D& HK P V G I G H S H 

Sbjct: 512 MKDLDKKIEEKIAGIMKQYGVKRESIWNKEKNAIIYPHGDHHHADPIDEHKPVGIGHSH 571 



Query: 61 SNYELFKPEEGVAKKEGNKVYTGEELTNWNLLKNSTFNNQNFTLANGQKRVSFSFPPEL 120 

SN Y E L FK PE E GVAKKE GN KV 7T GE E LT I \V v'N LL XNS T FN>; QN FT* LA.?; GQKRV'S .FS F P PP. L 
Sbjct: 572 SNY E L FK PE E GVAKKE GNKVYTGE E LTNWNLLKNS T FNNQN FT LANGQKRVS FS F P PE L 631 

Query : 121 EKKLGINMLVKLITPDGKVLEKVSGKVFGEGVGNIANFELDQPYLPGQTFKYTIASKDYP 180 

E KKLG1' NMLVKL I T PDG K VLEKVS G KV FGE GVGN I AM FE LDQ P Y L PGQT FKYT IAS KD Y P 
Sbjct: 632 EKKLGINMLVKLITPDGKVLEKVSGKVFGEGVGNIANFELDQPYLPGQTFKYTIASKDYP 691 

Query: 181 EVSYDGTFTVPTSLAYKMASQTIFYPFHAGDTYLRVNPQFAVPKGTDALVRVFDEFHGNA 240 

E V 3 Y D GT FTV PT S L A YKMAS QT I FY P FKAGDT Y LR VN PQ. FAY FKGT D A LVRV FDE FZ< GNA 
Sbjct : 692 EVSYDGTFTVPTSLAYKMASQTIFYPFHAGDTYLRVNPQFAVPKGTDALVRVFDEFHGNA 751 

Query: 241 YLENNYKVGEIKLPIPKLNQGTTRTAGNKIPVTFMANAYLDNQSTYIVEVPILEKENQTD 300 

YLEN^'KVGEIKLPIPKLNQGTTRTAGNKIPVTPMA 
Sbjct: 752 YLENNYKVGEIKLPIPKLNQGTTRTAGNKIPVTFMANAYLDNQSTYIVEVPILEKENQTD 811 

Query: 301 KPSILPQFKRNKAQENLKLDEKVEEPKTSEKVEKEKLSETGNSTSNSTLEEVPTVDPVQE 360 

KP3XLPQFKRNKAQEN KLDEKYEEPKTSEKV'&KEKLSETGNST^^ 
Sbjct: 812 KPSILPQFKRNKAQENSKLDEKVEEPKTSEKVEKEKLSETGNSTSNSTLEEVPTVDPVQE 871 

Query: 361 KVAKFAES YGMKLENVL FNMDGT I E LY LPS GE VI KKNMADFTGE APQGNGENKPS ENGKV 420 

KV AK.V A.w S Y GMKLEN V L PN>jDGT X E L Y LPSGE V I KKNMAl; FT GE APQGN GENKP8E» GKV 
Sbjct: 872 KVAKFAES YGMKLEWLFhfl^DGTIELYLPSGEVIKKNMTVDFTGEAPQGNGENKPSENGKV 931 

Query : 421 STGTVENQPTENKPADSLPEAPNEKPVKPENSTDNGMLNPEGNVGSDPMLDPALEEAPAV 48 0 

S T GT V ENQ PT E N K PAD S L PE APNE K P VKPSNS T DNGMLN GN V G3 D PMLD PALE E APAV 
Sbjct: 932 STGTVENQPTENKPADSLPEAPNEKPVKPENSTDNGMLNPEGNVGSDPMLDPALEEAPAV 991 

Query: 481 DPVQEKLEKFTASYGLGLDSVIFNMDGTIELRLPSGEVIKKNLSDLIA 528 

D F VQE KLE K FTAS Y GL GLDS V I Fr<MD GT I E LP. L P 3 GE VI KKN L5 0 1: 1 A. 
Sbjct: 992 DPVQEKLEKFTASYGLGLDSVIFNMDGTIELRLPSGEVIKKNLSDLIA 1039 
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Search |Swiss-Prot/TrEMBL g| for |af291695 

UniProtKB/TrEMBL 
entry Q9ANY1 [WBi 'msmmm mm 

[Entry info] [Name and origin] [References] [Comments] [Cross-references] [Keywords] 

[Features] [Sequence] [ Tools] 

Note: most headings are clickable, even if they don't appear as links. They link to the user manual or other documents. 

Entry information 
Entry name 

Primary accession number 
Secondary accession number 
Entered in TrEMBL in 
Sequence was last modified in 
Annotations were last modified in 
Name and origin of the protein 
Protein name 
Synonym 
Gene name 

From 

Taxonomy 
References 

[1] NUCLEOTIDE SEQUENCE. 

DOI=10.1128/IAL 69.2.949-958.2001; PubMed=l 1159990 [NCBI, ExPASy, EBI, Israel, Japan] 
Adamou J.E., Heinrichs J.H., Erwin A.L., Walsh W., Gayle T., Dormitzer M., Dagan R., Brewah 
Y.A., Barren P., Lathigra R., Langermann S., Koenig S., Johnson S.; 

"Identification and characterization of a novel family of pneumococcal proteins (the Pht family) that 

are protective against sepsis."; 

Infect. Immun. 69:949-958(2001). 
[2] NUCLEOTIDE SEQUENCE. 
_ STRAIN=ATCC BAA-334 / TIGR4; 

DOI=10.1126/science.l061217; PubMed= 11463916 [NCBI, ExPASy, EBI, Israel, Japan] 

Tettelin H., Nelson K.E., Paulsen I T., Eisen J.A., Read T.D., Peterson S.N., Heidelberg J.F., DeBoy 

R.T., Haft D.H., Dodson R.J., Durkin A.S., Gwinn ML, Kolonay J.F., Nelson W.C., Peterson J.D., 

Umayam L.A., White O., Salzberg S.L., Lewis MR., H , Fraser CM.; 

"Complete genome sequence of a virulent isolate of Streptococcus pneumoniae."; 

Science 293:498-506(2001). 

Comments 
None 

Cross-references 

AF3 18956; AAK06761.1; [EMBL / GenBank / DDBJ] 

Genomic_DNA. [CoDingSequence] 



Contact us Swiss-Prot 

\ liif IPM 



Q9ANY1STRPN 
Q9ANY1 

Q7D4B6 

Release 17, June 2001 
Release 17, June 2001 
Release 30, May 2005 

Pneumococcal histidine triad protein E [Precursor] 
Hypothetical protein SP1004 

Name: phtE 

OrderedLocusNames: SP1004 

Streptococcus pneumoniae [TaxED: 1313] 

Bacteria; Firmicutes; Lactobacillales; Streptococcaceae; 

Streptococcus. 
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EMBL 

PIR 
TIGR 

InterPro 
Pfam 

TIGRFAMs 

ProDom 

HOGENOM 

ProtoMap 

PRESAGE 

ModBase 

SWISS- 
2DPAGE 

UniRef 

Keywords 

Complete proteome; Hypothetical protein; Signal. 

Features 



Feature table viewer 



[EMBL / GenBank / DDBJ] 
[CoDingSequence] 



AE007403; AAK75121.1; -; 
GenomicDNA. 
H95115; H95115. 
SP1004; -. 

IPR006270; Strep_his_triad. 
Graphical view of domain structure. 
PF04270; Strep_his_triad; 5. 
Pfam graphical view of domain structure. 
TIGR01363; strep_his_triad; 3. 

[Domain structure / List of seq. sharing at least 1 domain] 
[Family / Alignment / Tree] 
Q9ANY1. 
Q9ANY1. 
Q9ANY1 . 

Get region on 2D PAGE. 

View cluster of proteins with at least 50% / 90% identity. 




Key From To Length Description 

SIGNAL 1 29 29 'potential. 

Sequence information 

Length: 1039 Molecular weight: 1 14631 CRC64: 81 A563FC806625C4 [This is a checksum on the 



AA 


Da 




sequence] 






10 


20 


30 


4.0 


50 


60 


MKFSKKYIAA 


GSAVIVSLSL 


CAYALNQHRS 


QENKDNNRVS 


YVDGSQSSQK 


SENLTPDQVS 


70 


8.0 


9_0 


100 


110 


120 


QKEGIQAEQI 


VIKITDQGYV 


TSHGDHYHYY 


NGKVPYDALF 


SEELLMKDPN 


YQLKDADIVN 


130 


14j) 


150 


160 


110 


180 


EVKGGYIIKV 


DGKYYVYLKD 


AAHADNVRTK 


DEINRQKQEH 


VKDNEKVNSN 


VAVARSQGRY 


190 


200 


210 


220 


23.0 


240 


TTNDGYVFNP 


ADIIEDTGNA 


YIVPHGGHYH 


YIPKSDLSAS 


ELAAAKAHLA 


GKNMQPSQLS 


250 


260 


27 0 


28jD 


2 90 


300 


YSSTASDNNT 


QSVAKGSTSK 


PANKSENLQS 


LLKELYDSPS 


AQRYSESDGL 


VFDPAKIISR 


310 


320 


33J3 


340 


350 


360 


TPNGVAIPHG 


DHYHFIPYSK 


LSALEEKIAR 


MVPISGTGST 


VSTNAKPNEV 


VSSLGSLSSN 


370 


38 0 


390 


400 


410 


420 


PSSLTTSKEL 


SSASDGYIFN 


PKDIVEETAT 


AYIVRHGDHF 


HYIPKSNQIG 


QPTLPNNSLA 


43.0 


440 


450 


460 


470 


480 
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TPSPSLPINP GTSHEKHEED GYGFDANRII AEDESGFVMS HGDHNHYFFK KDLTEEQIKA 

490 50.0 510 520 530 54j0 

AQKHLEEVKT SHNGLDSLSS HEQDYPSNAK EMKDLDKKIE EKIAGIMKQY GVKRESIWN 

550 56J0 570 580 590 60j0 

KEKNAIIYPH GDHHHADPID EHKPVGIGHS HSNYELFKPE EGVAKKEGNK VYTGEELTNV 

610 620 630 64_0 650 660 

VNLLKNSTFN NQNFTLANGQ KRVSFSFPPE LEKKLGINML VKLITPDGKV LEKVSGKVFG 

672 68 H 69 .0 10 1 71 2 72 2 

EGVGNIANFE LDQPYLPGQT FKYTIASKDY PEVSYDGTFT VPTSLAYKMA SQTIFYPFHA 

730 742 750 76.0 770 782 

GDTYLRVNPQ FAVPKGTDAL VRVFDEFHGN AYLENNYKVG EIKLPIPKLN QGTTRTAGNK 

792 80 £ 81 0 82 H 83 2 84 £ 

IPVTFMANAY LDNQSTYIVE VPILEKENQT DKPSILPQFK RNKAQENLKL DEKVEEPKTS 
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